All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/19] RTAS maintenance
@ 2023-02-06 18:54 ` Nathan Lynch via B4 Submission Endpoint
  0 siblings, 0 replies; 50+ messages in thread
From: Nathan Lynch @ 2023-02-06 18:54 UTC (permalink / raw)
  To: Michael Ellerman, Nicholas Piggin, Christophe Leroy, Kajol Jain,
	Laurent Dufour, Mahesh J Salgaonkar, Andrew Donnellan,
	Nick Child
  Cc: linuxppc-dev, Nathan Lynch

Proposed changes for the RTAS subsystem and client code.

Fixes that are subject to backporting are at the front of the queue.
The rest of the queue is roughly ordered with respect to maturity:
i.e. patches that have already garnered some review and discussion
precede newer, more experimental changes.

Major features:

* Static tracepoints around RTAS entry/exit.
* An allocator for work area buffers.
* A new client of the work area allocator in the form of a
  higher-level API for PAPR system parameter retrieval.
* Constant-time symbolic RTAS function token lookups.

Tested with ppc64le in PowerVM LPARs and QEMU's pseries
model. Obsolescent RTAS platforms (chrp, cell, maple) get build
coverage.

---
Changes in v2:

* Drop applied patches:
  - powerpc/rtas: document rtas_call()
  - powerpc/rtasd: use correct OF API for event scan rate
  - powerpc/rtas: avoid device tree lookups in rtas_os_term()
  - powerpc/rtas: avoid scheduling in rtas_os_term()
  - powerpc/pseries/eeh: use correct API for error log size
  - powerpc/rtas: clean up rtas_error_log_max initialization
  - powerpc/rtas: clean up includes
  - powerpc/rtas: define pr_fmt and convert printk call sites
  - powerpc/rtas: mandate RTAS syscall filtering

* Additions:
  - Safe early-boot fallback in rtas_busy_delay().
  - Fixes for missed RTAS function call retries in various places.
  - Remove RTAS timebase sync from pseries, previously posted
    separately as "powerpc/pseries: drop RTAS-based timebase
    synchronization":
    https://lore.kernel.org/linuxppc-dev/20230110042845.121792-1-nathanl@linux.ibm.com/T/#u
  - RTAS work area buffer allocator.
  - Conversion of pseries DLPAR code to work area allocator.
  - A pseries-specific PAPR system parameter API built on top of the
    RTAS work area allocator.
  - Conversion of ibm,get-system-parameter users to papr_sysparm API.
  - New rtas_function_token() API and associated conversions.

* Modifications to existing patches:
  - Convert RTAS tracepoint definitions to unconditional
    variants (TRACE_EVENT_CONDITION() -> TRACE_EVENT()), dropping a
    cpu_online() check that duplicates work already done at the call
    site.
  - Skip tracepoints in unsafe contexts (real mode, CPU
    offline). (Nicholas Piggin)
  - Use bool bitfield for "banned on LE" function flag.
  - Better documentation for "banned on LE" function flag. (Andrew Donnellan)
  - Drop unnecessary cast for xa_load() key argument. (Nick Child)

* Link to v1: https://lore.kernel.org/r/20221118150751.469393-1-nathanl@linux.ibm.com

---
Nathan Lynch (19):
      powerpc/rtas: handle extended delays safely in early boot
      powerpc/perf/hv-24x7: add missing RTAS retry status handling
      powerpc/pseries/lpar: add missing RTAS retry status handling
      powerpc/pseries/lparcfg: add missing RTAS retry status handling
      powerpc/pseries/setup: add missing RTAS retry status handling
      powerpc/pseries: drop RTAS-based timebase synchronization
      powerpc/rtas: improve function information lookups
      powerpc/rtas: strengthen do_enter_rtas() type safety, drop inline
      powerpc/tracing: tracepoints for RTAS entry and exit
      powerpc/rtas: add tracepoints around RTAS entry
      powerpc/rtas: add work area allocator
      powerpc/pseries/dlpar: use RTAS work area API
      powerpc/pseries: PAPR system parameter API
      powerpc/pseries: convert CMO probe to papr_sysparm API
      powerpc/pseries/lparcfg: convert to papr_sysparm API
      powerpc/pseries/hv-24x7: convert to papr_sysparm API
      powerpc/pseries/lpar: convert to papr_sysparm API
      powerpc/rtas: introduce rtas_function_token() API
      powerpc/rtas: arch-wide function token lookup conversions

 arch/powerpc/include/asm/papr-sysparm.h       |  38 ++
 arch/powerpc/include/asm/rtas-work-area.h     |  45 ++
 arch/powerpc/include/asm/rtas.h               | 185 +++++
 arch/powerpc/include/asm/trace.h              | 103 +++
 arch/powerpc/kernel/Makefile                  |   3 +-
 arch/powerpc/kernel/rtas-proc.c               |  24 +-
 arch/powerpc/kernel/rtas-rtc.c                |   6 +-
 arch/powerpc/kernel/rtas-work-area.c          | 208 ++++++
 arch/powerpc/kernel/rtas.c                    | 942 +++++++++++++++++++++-----
 arch/powerpc/kernel/rtas_flash.c              |  21 +-
 arch/powerpc/kernel/rtas_pci.c                |   8 +-
 arch/powerpc/kernel/rtasd.c                   |   2 +-
 arch/powerpc/perf/hv-24x7.c                   |  45 +-
 arch/powerpc/platforms/52xx/efika.c           |   4 +-
 arch/powerpc/platforms/cell/ras.c             |   4 +-
 arch/powerpc/platforms/cell/smp.c             |   4 +-
 arch/powerpc/platforms/chrp/nvram.c           |   4 +-
 arch/powerpc/platforms/chrp/pci.c             |   4 +-
 arch/powerpc/platforms/chrp/setup.c           |   4 +-
 arch/powerpc/platforms/maple/setup.c          |   4 +-
 arch/powerpc/platforms/pseries/Makefile       |   2 +-
 arch/powerpc/platforms/pseries/dlpar.c        |  29 +-
 arch/powerpc/platforms/pseries/eeh_pseries.c  |  22 +-
 arch/powerpc/platforms/pseries/hotplug-cpu.c  |   4 +-
 arch/powerpc/platforms/pseries/io_event_irq.c |   2 +-
 arch/powerpc/platforms/pseries/lpar.c         |  37 +-
 arch/powerpc/platforms/pseries/lparcfg.c      | 104 +--
 arch/powerpc/platforms/pseries/mobility.c     |   4 +-
 arch/powerpc/platforms/pseries/msi.c          |   4 +-
 arch/powerpc/platforms/pseries/nvram.c        |   4 +-
 arch/powerpc/platforms/pseries/papr-sysparm.c | 151 +++++
 arch/powerpc/platforms/pseries/pci.c          |   2 +-
 arch/powerpc/platforms/pseries/ras.c          |   2 +-
 arch/powerpc/platforms/pseries/setup.c        |  27 +-
 arch/powerpc/platforms/pseries/smp.c          |  12 +-
 arch/powerpc/sysdev/xics/ics-rtas.c           |   8 +-
 arch/powerpc/xmon/xmon.c                      |  16 +-
 37 files changed, 1660 insertions(+), 428 deletions(-)
---
base-commit: 0bfb97203f5f300777624a2ad6f8f84aea3e8658
change-id: 20230125-b4-powerpc-rtas-queue-cf85ec465ff9

Best regards,
-- 
Nathan Lynch <nathanl@linux.ibm.com>


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v2 00/19] RTAS maintenance
@ 2023-02-06 18:54 ` Nathan Lynch via B4 Submission Endpoint
  0 siblings, 0 replies; 50+ messages in thread
From: Nathan Lynch via B4 Submission Endpoint @ 2023-02-06 18:54 UTC (permalink / raw)
  To: Michael Ellerman, Nicholas Piggin, Christophe Leroy, Kajol Jain,
	Laurent Dufour, Mahesh J Salgaonkar, Andrew Donnellan,
	Nick Child
  Cc: Nathan Lynch, linuxppc-dev

Proposed changes for the RTAS subsystem and client code.

Fixes that are subject to backporting are at the front of the queue.
The rest of the queue is roughly ordered with respect to maturity:
i.e. patches that have already garnered some review and discussion
precede newer, more experimental changes.

Major features:

* Static tracepoints around RTAS entry/exit.
* An allocator for work area buffers.
* A new client of the work area allocator in the form of a
  higher-level API for PAPR system parameter retrieval.
* Constant-time symbolic RTAS function token lookups.

Tested with ppc64le in PowerVM LPARs and QEMU's pseries
model. Obsolescent RTAS platforms (chrp, cell, maple) get build
coverage.

---
Changes in v2:

* Drop applied patches:
  - powerpc/rtas: document rtas_call()
  - powerpc/rtasd: use correct OF API for event scan rate
  - powerpc/rtas: avoid device tree lookups in rtas_os_term()
  - powerpc/rtas: avoid scheduling in rtas_os_term()
  - powerpc/pseries/eeh: use correct API for error log size
  - powerpc/rtas: clean up rtas_error_log_max initialization
  - powerpc/rtas: clean up includes
  - powerpc/rtas: define pr_fmt and convert printk call sites
  - powerpc/rtas: mandate RTAS syscall filtering

* Additions:
  - Safe early-boot fallback in rtas_busy_delay().
  - Fixes for missed RTAS function call retries in various places.
  - Remove RTAS timebase sync from pseries, previously posted
    separately as "powerpc/pseries: drop RTAS-based timebase
    synchronization":
    https://lore.kernel.org/linuxppc-dev/20230110042845.121792-1-nathanl@linux.ibm.com/T/#u
  - RTAS work area buffer allocator.
  - Conversion of pseries DLPAR code to work area allocator.
  - A pseries-specific PAPR system parameter API built on top of the
    RTAS work area allocator.
  - Conversion of ibm,get-system-parameter users to papr_sysparm API.
  - New rtas_function_token() API and associated conversions.

* Modifications to existing patches:
  - Convert RTAS tracepoint definitions to unconditional
    variants (TRACE_EVENT_CONDITION() -> TRACE_EVENT()), dropping a
    cpu_online() check that duplicates work already done at the call
    site.
  - Skip tracepoints in unsafe contexts (real mode, CPU
    offline). (Nicholas Piggin)
  - Use bool bitfield for "banned on LE" function flag.
  - Better documentation for "banned on LE" function flag. (Andrew Donnellan)
  - Drop unnecessary cast for xa_load() key argument. (Nick Child)

* Link to v1: https://lore.kernel.org/r/20221118150751.469393-1-nathanl@linux.ibm.com

---
Nathan Lynch (19):
      powerpc/rtas: handle extended delays safely in early boot
      powerpc/perf/hv-24x7: add missing RTAS retry status handling
      powerpc/pseries/lpar: add missing RTAS retry status handling
      powerpc/pseries/lparcfg: add missing RTAS retry status handling
      powerpc/pseries/setup: add missing RTAS retry status handling
      powerpc/pseries: drop RTAS-based timebase synchronization
      powerpc/rtas: improve function information lookups
      powerpc/rtas: strengthen do_enter_rtas() type safety, drop inline
      powerpc/tracing: tracepoints for RTAS entry and exit
      powerpc/rtas: add tracepoints around RTAS entry
      powerpc/rtas: add work area allocator
      powerpc/pseries/dlpar: use RTAS work area API
      powerpc/pseries: PAPR system parameter API
      powerpc/pseries: convert CMO probe to papr_sysparm API
      powerpc/pseries/lparcfg: convert to papr_sysparm API
      powerpc/pseries/hv-24x7: convert to papr_sysparm API
      powerpc/pseries/lpar: convert to papr_sysparm API
      powerpc/rtas: introduce rtas_function_token() API
      powerpc/rtas: arch-wide function token lookup conversions

 arch/powerpc/include/asm/papr-sysparm.h       |  38 ++
 arch/powerpc/include/asm/rtas-work-area.h     |  45 ++
 arch/powerpc/include/asm/rtas.h               | 185 +++++
 arch/powerpc/include/asm/trace.h              | 103 +++
 arch/powerpc/kernel/Makefile                  |   3 +-
 arch/powerpc/kernel/rtas-proc.c               |  24 +-
 arch/powerpc/kernel/rtas-rtc.c                |   6 +-
 arch/powerpc/kernel/rtas-work-area.c          | 208 ++++++
 arch/powerpc/kernel/rtas.c                    | 942 +++++++++++++++++++++-----
 arch/powerpc/kernel/rtas_flash.c              |  21 +-
 arch/powerpc/kernel/rtas_pci.c                |   8 +-
 arch/powerpc/kernel/rtasd.c                   |   2 +-
 arch/powerpc/perf/hv-24x7.c                   |  45 +-
 arch/powerpc/platforms/52xx/efika.c           |   4 +-
 arch/powerpc/platforms/cell/ras.c             |   4 +-
 arch/powerpc/platforms/cell/smp.c             |   4 +-
 arch/powerpc/platforms/chrp/nvram.c           |   4 +-
 arch/powerpc/platforms/chrp/pci.c             |   4 +-
 arch/powerpc/platforms/chrp/setup.c           |   4 +-
 arch/powerpc/platforms/maple/setup.c          |   4 +-
 arch/powerpc/platforms/pseries/Makefile       |   2 +-
 arch/powerpc/platforms/pseries/dlpar.c        |  29 +-
 arch/powerpc/platforms/pseries/eeh_pseries.c  |  22 +-
 arch/powerpc/platforms/pseries/hotplug-cpu.c  |   4 +-
 arch/powerpc/platforms/pseries/io_event_irq.c |   2 +-
 arch/powerpc/platforms/pseries/lpar.c         |  37 +-
 arch/powerpc/platforms/pseries/lparcfg.c      | 104 +--
 arch/powerpc/platforms/pseries/mobility.c     |   4 +-
 arch/powerpc/platforms/pseries/msi.c          |   4 +-
 arch/powerpc/platforms/pseries/nvram.c        |   4 +-
 arch/powerpc/platforms/pseries/papr-sysparm.c | 151 +++++
 arch/powerpc/platforms/pseries/pci.c          |   2 +-
 arch/powerpc/platforms/pseries/ras.c          |   2 +-
 arch/powerpc/platforms/pseries/setup.c        |  27 +-
 arch/powerpc/platforms/pseries/smp.c          |  12 +-
 arch/powerpc/sysdev/xics/ics-rtas.c           |   8 +-
 arch/powerpc/xmon/xmon.c                      |  16 +-
 37 files changed, 1660 insertions(+), 428 deletions(-)
---
base-commit: 0bfb97203f5f300777624a2ad6f8f84aea3e8658
change-id: 20230125-b4-powerpc-rtas-queue-cf85ec465ff9

Best regards,
-- 
Nathan Lynch <nathanl@linux.ibm.com>


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v2 01/19] powerpc/rtas: handle extended delays safely in early boot
  2023-02-06 18:54 ` Nathan Lynch via B4 Submission Endpoint
@ 2023-02-06 18:54   ` Nathan Lynch via B4 Submission Endpoint
  -1 siblings, 0 replies; 50+ messages in thread
From: Nathan Lynch @ 2023-02-06 18:54 UTC (permalink / raw)
  To: Michael Ellerman, Nicholas Piggin, Christophe Leroy, Kajol Jain,
	Laurent Dufour, Mahesh J Salgaonkar, Andrew Donnellan,
	Nick Child
  Cc: linuxppc-dev, Nathan Lynch

Some code that runs early in boot calls RTAS functions that can return
-2 or 990x statuses, which mean the caller should retry. An example is
pSeries_cmo_feature_init(), which invokes ibm,get-system-parameter but
treats these benign statuses as errors instead of retrying.

pSeries_cmo_feature_init() and similar code should be made to retry
until they succeed or receive a real error, using the usual pattern:

	do {
		rc = rtas_call(token, etc...);
	} while (rtas_busy_delay(rc));

But rtas_busy_delay() will perform a timed sleep on any 990x
status. This isn't safe so early in boot, before the CPU scheduler and
timer subsystem have initialized.

The -2 RTAS status is much more likely to occur during single-threaded
boot than 990x in practice, at least on PowerVM. This is because -2
usually means that RTAS made progress but exhausted its self-imposed
timeslice, while 990x is associated with concurrent requests from the
OS causing internal contention. Regardless, according to the language
in PAPR, the OS should be prepared to handle either type of status at
any time.

Add a fallback path to rtas_busy_delay() to handle this as safely as
possible, performing a small delay on 990x. Include a counter to
detect retry loops that aren't making progress and bail out.

This was found by inspection and I'm not aware of any real
failures. However, the implementation of rtas_busy_delay() before
commit 38f7b7067dae ("powerpc/rtas: rtas_busy_delay() improvements")
was not susceptible to this problem, so let's treat this as a
regression.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Fixes: 38f7b7067dae ("powerpc/rtas: rtas_busy_delay() improvements")
---
 arch/powerpc/kernel/rtas.c | 48 +++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 47 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index 795225d7f138..ec2df09a70cf 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -606,6 +606,46 @@ unsigned int rtas_busy_delay_time(int status)
 	return ms;
 }
 
+/*
+ * Early boot fallback for rtas_busy_delay().
+ */
+static bool __init rtas_busy_delay_early(int status)
+{
+	static size_t successive_ext_delays __initdata;
+	bool ret;
+
+	switch (status) {
+	case RTAS_EXTENDED_DELAY_MIN...RTAS_EXTENDED_DELAY_MAX:
+		/*
+		 * In the unlikely case that we receive an extended
+		 * delay status in early boot, the OS is probably not
+		 * the cause, and there's nothing we can do to clear
+		 * the condition. Best we can do is delay for a bit
+		 * and hope it's transient. Lie to the caller if it
+		 * seems like we're stuck in a retry loop.
+		 */
+		mdelay(1);
+		ret = true;
+		successive_ext_delays += 1;
+		if (successive_ext_delays > 1000) {
+			pr_err("too many extended delays, giving up\n");
+			dump_stack();
+			ret = false;
+		}
+		break;
+	case RTAS_BUSY:
+		ret = true;
+		successive_ext_delays = 0;
+		break;
+	default:
+		ret = false;
+		successive_ext_delays = 0;
+		break;
+	}
+
+	return ret;
+}
+
 /**
  * rtas_busy_delay() - helper for RTAS busy and extended delay statuses
  *
@@ -624,11 +664,17 @@ unsigned int rtas_busy_delay_time(int status)
  * * false - @status is not @RTAS_BUSY nor an extended delay hint. The
  *           caller is responsible for handling @status.
  */
-bool rtas_busy_delay(int status)
+bool __ref rtas_busy_delay(int status)
 {
 	unsigned int ms;
 	bool ret;
 
+	/*
+	 * Can't do timed sleeps before timekeeping is up.
+	 */
+	if (system_state < SYSTEM_SCHEDULING)
+		return rtas_busy_delay_early(status);
+
 	switch (status) {
 	case RTAS_EXTENDED_DELAY_MIN...RTAS_EXTENDED_DELAY_MAX:
 		ret = true;

-- 
2.39.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 01/19] powerpc/rtas: handle extended delays safely in early boot
@ 2023-02-06 18:54   ` Nathan Lynch via B4 Submission Endpoint
  0 siblings, 0 replies; 50+ messages in thread
From: Nathan Lynch via B4 Submission Endpoint @ 2023-02-06 18:54 UTC (permalink / raw)
  To: Michael Ellerman, Nicholas Piggin, Christophe Leroy, Kajol Jain,
	Laurent Dufour, Mahesh J Salgaonkar, Andrew Donnellan,
	Nick Child
  Cc: Nathan Lynch, linuxppc-dev

From: Nathan Lynch <nathanl@linux.ibm.com>

Some code that runs early in boot calls RTAS functions that can return
-2 or 990x statuses, which mean the caller should retry. An example is
pSeries_cmo_feature_init(), which invokes ibm,get-system-parameter but
treats these benign statuses as errors instead of retrying.

pSeries_cmo_feature_init() and similar code should be made to retry
until they succeed or receive a real error, using the usual pattern:

	do {
		rc = rtas_call(token, etc...);
	} while (rtas_busy_delay(rc));

But rtas_busy_delay() will perform a timed sleep on any 990x
status. This isn't safe so early in boot, before the CPU scheduler and
timer subsystem have initialized.

The -2 RTAS status is much more likely to occur during single-threaded
boot than 990x in practice, at least on PowerVM. This is because -2
usually means that RTAS made progress but exhausted its self-imposed
timeslice, while 990x is associated with concurrent requests from the
OS causing internal contention. Regardless, according to the language
in PAPR, the OS should be prepared to handle either type of status at
any time.

Add a fallback path to rtas_busy_delay() to handle this as safely as
possible, performing a small delay on 990x. Include a counter to
detect retry loops that aren't making progress and bail out.

This was found by inspection and I'm not aware of any real
failures. However, the implementation of rtas_busy_delay() before
commit 38f7b7067dae ("powerpc/rtas: rtas_busy_delay() improvements")
was not susceptible to this problem, so let's treat this as a
regression.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Fixes: 38f7b7067dae ("powerpc/rtas: rtas_busy_delay() improvements")
---
 arch/powerpc/kernel/rtas.c | 48 +++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 47 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index 795225d7f138..ec2df09a70cf 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -606,6 +606,46 @@ unsigned int rtas_busy_delay_time(int status)
 	return ms;
 }
 
+/*
+ * Early boot fallback for rtas_busy_delay().
+ */
+static bool __init rtas_busy_delay_early(int status)
+{
+	static size_t successive_ext_delays __initdata;
+	bool ret;
+
+	switch (status) {
+	case RTAS_EXTENDED_DELAY_MIN...RTAS_EXTENDED_DELAY_MAX:
+		/*
+		 * In the unlikely case that we receive an extended
+		 * delay status in early boot, the OS is probably not
+		 * the cause, and there's nothing we can do to clear
+		 * the condition. Best we can do is delay for a bit
+		 * and hope it's transient. Lie to the caller if it
+		 * seems like we're stuck in a retry loop.
+		 */
+		mdelay(1);
+		ret = true;
+		successive_ext_delays += 1;
+		if (successive_ext_delays > 1000) {
+			pr_err("too many extended delays, giving up\n");
+			dump_stack();
+			ret = false;
+		}
+		break;
+	case RTAS_BUSY:
+		ret = true;
+		successive_ext_delays = 0;
+		break;
+	default:
+		ret = false;
+		successive_ext_delays = 0;
+		break;
+	}
+
+	return ret;
+}
+
 /**
  * rtas_busy_delay() - helper for RTAS busy and extended delay statuses
  *
@@ -624,11 +664,17 @@ unsigned int rtas_busy_delay_time(int status)
  * * false - @status is not @RTAS_BUSY nor an extended delay hint. The
  *           caller is responsible for handling @status.
  */
-bool rtas_busy_delay(int status)
+bool __ref rtas_busy_delay(int status)
 {
 	unsigned int ms;
 	bool ret;
 
+	/*
+	 * Can't do timed sleeps before timekeeping is up.
+	 */
+	if (system_state < SYSTEM_SCHEDULING)
+		return rtas_busy_delay_early(status);
+
 	switch (status) {
 	case RTAS_EXTENDED_DELAY_MIN...RTAS_EXTENDED_DELAY_MAX:
 		ret = true;

-- 
2.39.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 02/19] powerpc/perf/hv-24x7: add missing RTAS retry status handling
  2023-02-06 18:54 ` Nathan Lynch via B4 Submission Endpoint
@ 2023-02-06 18:54   ` Nathan Lynch
  -1 siblings, 0 replies; 50+ messages in thread
From: Nathan Lynch via B4 Submission Endpoint @ 2023-02-06 18:54 UTC (permalink / raw)
  To: Michael Ellerman, Nicholas Piggin, Christophe Leroy, Kajol Jain,
	Laurent Dufour, Mahesh J Salgaonkar, Andrew Donnellan,
	Nick Child
  Cc: Nathan Lynch, linuxppc-dev

From: Nathan Lynch <nathanl@linux.ibm.com>

The ibm,get-system-parameter RTAS function may return -2 or 990x,
which indicate that the caller should try again. read_24x7_sys_info()
ignores this, allowing transient failures in reporting processor
module information.

Move the RTAS call into a coventional rtas_busy_delay()-based loop,
along with the parsing of results on success.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Fixes: 8ba214267382 ("powerpc/hv-24x7: Add rtas call in hv-24x7 driver to get processor details")
---
 arch/powerpc/perf/hv-24x7.c | 42 ++++++++++++++++++------------------------
 1 file changed, 18 insertions(+), 24 deletions(-)

diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c
index 33c23225fd54..fcfebf5bd378 100644
--- a/arch/powerpc/perf/hv-24x7.c
+++ b/arch/powerpc/perf/hv-24x7.c
@@ -79,9 +79,8 @@ static u32 phys_coresperchip; /* Physical cores per chip */
  */
 void read_24x7_sys_info(void)
 {
-	int call_status, len, ntypes;
-
-	spin_lock(&rtas_data_buf_lock);
+	const s32 token = rtas_token("ibm,get-system-parameter");
+	int call_status;
 
 	/*
 	 * Making system parameter: chips and sockets and cores per chip
@@ -91,32 +90,27 @@ void read_24x7_sys_info(void)
 	phys_chipspersocket = 1;
 	phys_coresperchip = 1;
 
-	call_status = rtas_call(rtas_token("ibm,get-system-parameter"), 3, 1,
-				NULL,
-				PROCESSOR_MODULE_INFO,
-				__pa(rtas_data_buf),
-				RTAS_DATA_BUF_SIZE);
+	do {
+		spin_lock(&rtas_data_buf_lock);
+		call_status = rtas_call(token, 3, 1, NULL, PROCESSOR_MODULE_INFO,
+					__pa(rtas_data_buf), RTAS_DATA_BUF_SIZE);
+		if (call_status == 0) {
+			int ntypes = be16_to_cpup((__be16 *)&rtas_data_buf[2]);
+			int len = be16_to_cpup((__be16 *)&rtas_data_buf[0]);
+
+			if (len >= 8 && ntypes != 0) {
+				phys_sockets = be16_to_cpup((__be16 *)&rtas_data_buf[4]);
+				phys_chipspersocket = be16_to_cpup((__be16 *)&rtas_data_buf[6]);
+				phys_coresperchip = be16_to_cpup((__be16 *)&rtas_data_buf[8]);
+			}
+		}
+		spin_unlock(&rtas_data_buf_lock);
+	} while (rtas_busy_delay(call_status));
 
 	if (call_status != 0) {
 		pr_err("Error calling get-system-parameter %d\n",
 		       call_status);
-	} else {
-		len = be16_to_cpup((__be16 *)&rtas_data_buf[0]);
-		if (len < 8)
-			goto out;
-
-		ntypes = be16_to_cpup((__be16 *)&rtas_data_buf[2]);
-
-		if (!ntypes)
-			goto out;
-
-		phys_sockets = be16_to_cpup((__be16 *)&rtas_data_buf[4]);
-		phys_chipspersocket = be16_to_cpup((__be16 *)&rtas_data_buf[6]);
-		phys_coresperchip = be16_to_cpup((__be16 *)&rtas_data_buf[8]);
 	}
-
-out:
-	spin_unlock(&rtas_data_buf_lock);
 }
 
 /* Domains for which more than one result element are returned for each event. */

-- 
2.39.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 02/19] powerpc/perf/hv-24x7: add missing RTAS retry status handling
@ 2023-02-06 18:54   ` Nathan Lynch
  0 siblings, 0 replies; 50+ messages in thread
From: Nathan Lynch @ 2023-02-06 18:54 UTC (permalink / raw)
  To: Michael Ellerman, Nicholas Piggin, Christophe Leroy, Kajol Jain,
	Laurent Dufour, Mahesh J Salgaonkar, Andrew Donnellan,
	Nick Child
  Cc: linuxppc-dev, Nathan Lynch

The ibm,get-system-parameter RTAS function may return -2 or 990x,
which indicate that the caller should try again. read_24x7_sys_info()
ignores this, allowing transient failures in reporting processor
module information.

Move the RTAS call into a coventional rtas_busy_delay()-based loop,
along with the parsing of results on success.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Fixes: 8ba214267382 ("powerpc/hv-24x7: Add rtas call in hv-24x7 driver to get processor details")
---
 arch/powerpc/perf/hv-24x7.c | 42 ++++++++++++++++++------------------------
 1 file changed, 18 insertions(+), 24 deletions(-)

diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c
index 33c23225fd54..fcfebf5bd378 100644
--- a/arch/powerpc/perf/hv-24x7.c
+++ b/arch/powerpc/perf/hv-24x7.c
@@ -79,9 +79,8 @@ static u32 phys_coresperchip; /* Physical cores per chip */
  */
 void read_24x7_sys_info(void)
 {
-	int call_status, len, ntypes;
-
-	spin_lock(&rtas_data_buf_lock);
+	const s32 token = rtas_token("ibm,get-system-parameter");
+	int call_status;
 
 	/*
 	 * Making system parameter: chips and sockets and cores per chip
@@ -91,32 +90,27 @@ void read_24x7_sys_info(void)
 	phys_chipspersocket = 1;
 	phys_coresperchip = 1;
 
-	call_status = rtas_call(rtas_token("ibm,get-system-parameter"), 3, 1,
-				NULL,
-				PROCESSOR_MODULE_INFO,
-				__pa(rtas_data_buf),
-				RTAS_DATA_BUF_SIZE);
+	do {
+		spin_lock(&rtas_data_buf_lock);
+		call_status = rtas_call(token, 3, 1, NULL, PROCESSOR_MODULE_INFO,
+					__pa(rtas_data_buf), RTAS_DATA_BUF_SIZE);
+		if (call_status == 0) {
+			int ntypes = be16_to_cpup((__be16 *)&rtas_data_buf[2]);
+			int len = be16_to_cpup((__be16 *)&rtas_data_buf[0]);
+
+			if (len >= 8 && ntypes != 0) {
+				phys_sockets = be16_to_cpup((__be16 *)&rtas_data_buf[4]);
+				phys_chipspersocket = be16_to_cpup((__be16 *)&rtas_data_buf[6]);
+				phys_coresperchip = be16_to_cpup((__be16 *)&rtas_data_buf[8]);
+			}
+		}
+		spin_unlock(&rtas_data_buf_lock);
+	} while (rtas_busy_delay(call_status));
 
 	if (call_status != 0) {
 		pr_err("Error calling get-system-parameter %d\n",
 		       call_status);
-	} else {
-		len = be16_to_cpup((__be16 *)&rtas_data_buf[0]);
-		if (len < 8)
-			goto out;
-
-		ntypes = be16_to_cpup((__be16 *)&rtas_data_buf[2]);
-
-		if (!ntypes)
-			goto out;
-
-		phys_sockets = be16_to_cpup((__be16 *)&rtas_data_buf[4]);
-		phys_chipspersocket = be16_to_cpup((__be16 *)&rtas_data_buf[6]);
-		phys_coresperchip = be16_to_cpup((__be16 *)&rtas_data_buf[8]);
 	}
-
-out:
-	spin_unlock(&rtas_data_buf_lock);
 }
 
 /* Domains for which more than one result element are returned for each event. */

-- 
2.39.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 03/19] powerpc/pseries/lpar: add missing RTAS retry status handling
  2023-02-06 18:54 ` Nathan Lynch via B4 Submission Endpoint
@ 2023-02-06 18:54   ` Nathan Lynch via B4 Submission Endpoint
  -1 siblings, 0 replies; 50+ messages in thread
From: Nathan Lynch @ 2023-02-06 18:54 UTC (permalink / raw)
  To: Michael Ellerman, Nicholas Piggin, Christophe Leroy, Kajol Jain,
	Laurent Dufour, Mahesh J Salgaonkar, Andrew Donnellan,
	Nick Child
  Cc: linuxppc-dev, Nathan Lynch

The ibm,get-system-parameter RTAS function may return -2 or 990x,
which indicate that the caller should try again.

pseries_lpar_read_hblkrm_characteristics() ignores this, making it
possible to incorrectly detect TLB block invalidation characteristics
at boot.

Move the RTAS call into a coventional rtas_busy_delay()-based loop.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Fixes: 1211ee61b4a8 ("powerpc/pseries: Read TLB Block Invalidate Characteristics")
---
 arch/powerpc/platforms/pseries/lpar.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c
index 97ef6499e501..6597b2126ebb 100644
--- a/arch/powerpc/platforms/pseries/lpar.c
+++ b/arch/powerpc/platforms/pseries/lpar.c
@@ -1481,22 +1481,22 @@ static inline void __init check_lp_set_hblkrm(unsigned int lp,
 
 void __init pseries_lpar_read_hblkrm_characteristics(void)
 {
+	const s32 token = rtas_token("ibm,get-system-parameter");
 	unsigned char local_buffer[SPLPAR_TLB_BIC_MAXLENGTH];
 	int call_status, len, idx, bpsize;
 
 	if (!firmware_has_feature(FW_FEATURE_BLOCK_REMOVE))
 		return;
 
-	spin_lock(&rtas_data_buf_lock);
-	memset(rtas_data_buf, 0, RTAS_DATA_BUF_SIZE);
-	call_status = rtas_call(rtas_token("ibm,get-system-parameter"), 3, 1,
-				NULL,
-				SPLPAR_TLB_BIC_TOKEN,
-				__pa(rtas_data_buf),
-				RTAS_DATA_BUF_SIZE);
-	memcpy(local_buffer, rtas_data_buf, SPLPAR_TLB_BIC_MAXLENGTH);
-	local_buffer[SPLPAR_TLB_BIC_MAXLENGTH - 1] = '\0';
-	spin_unlock(&rtas_data_buf_lock);
+	do {
+		spin_lock(&rtas_data_buf_lock);
+		memset(rtas_data_buf, 0, RTAS_DATA_BUF_SIZE);
+		call_status = rtas_call(token, 3, 1, NULL, SPLPAR_TLB_BIC_TOKEN,
+					__pa(rtas_data_buf), RTAS_DATA_BUF_SIZE);
+		memcpy(local_buffer, rtas_data_buf, SPLPAR_TLB_BIC_MAXLENGTH);
+		local_buffer[SPLPAR_TLB_BIC_MAXLENGTH - 1] = '\0';
+		spin_unlock(&rtas_data_buf_lock);
+	} while (rtas_busy_delay(call_status));
 
 	if (call_status != 0) {
 		pr_warn("%s %s Error calling get-system-parameter (0x%x)\n",

-- 
2.39.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 03/19] powerpc/pseries/lpar: add missing RTAS retry status handling
@ 2023-02-06 18:54   ` Nathan Lynch via B4 Submission Endpoint
  0 siblings, 0 replies; 50+ messages in thread
From: Nathan Lynch via B4 Submission Endpoint @ 2023-02-06 18:54 UTC (permalink / raw)
  To: Michael Ellerman, Nicholas Piggin, Christophe Leroy, Kajol Jain,
	Laurent Dufour, Mahesh J Salgaonkar, Andrew Donnellan,
	Nick Child
  Cc: Nathan Lynch, linuxppc-dev

From: Nathan Lynch <nathanl@linux.ibm.com>

The ibm,get-system-parameter RTAS function may return -2 or 990x,
which indicate that the caller should try again.

pseries_lpar_read_hblkrm_characteristics() ignores this, making it
possible to incorrectly detect TLB block invalidation characteristics
at boot.

Move the RTAS call into a coventional rtas_busy_delay()-based loop.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Fixes: 1211ee61b4a8 ("powerpc/pseries: Read TLB Block Invalidate Characteristics")
---
 arch/powerpc/platforms/pseries/lpar.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c
index 97ef6499e501..6597b2126ebb 100644
--- a/arch/powerpc/platforms/pseries/lpar.c
+++ b/arch/powerpc/platforms/pseries/lpar.c
@@ -1481,22 +1481,22 @@ static inline void __init check_lp_set_hblkrm(unsigned int lp,
 
 void __init pseries_lpar_read_hblkrm_characteristics(void)
 {
+	const s32 token = rtas_token("ibm,get-system-parameter");
 	unsigned char local_buffer[SPLPAR_TLB_BIC_MAXLENGTH];
 	int call_status, len, idx, bpsize;
 
 	if (!firmware_has_feature(FW_FEATURE_BLOCK_REMOVE))
 		return;
 
-	spin_lock(&rtas_data_buf_lock);
-	memset(rtas_data_buf, 0, RTAS_DATA_BUF_SIZE);
-	call_status = rtas_call(rtas_token("ibm,get-system-parameter"), 3, 1,
-				NULL,
-				SPLPAR_TLB_BIC_TOKEN,
-				__pa(rtas_data_buf),
-				RTAS_DATA_BUF_SIZE);
-	memcpy(local_buffer, rtas_data_buf, SPLPAR_TLB_BIC_MAXLENGTH);
-	local_buffer[SPLPAR_TLB_BIC_MAXLENGTH - 1] = '\0';
-	spin_unlock(&rtas_data_buf_lock);
+	do {
+		spin_lock(&rtas_data_buf_lock);
+		memset(rtas_data_buf, 0, RTAS_DATA_BUF_SIZE);
+		call_status = rtas_call(token, 3, 1, NULL, SPLPAR_TLB_BIC_TOKEN,
+					__pa(rtas_data_buf), RTAS_DATA_BUF_SIZE);
+		memcpy(local_buffer, rtas_data_buf, SPLPAR_TLB_BIC_MAXLENGTH);
+		local_buffer[SPLPAR_TLB_BIC_MAXLENGTH - 1] = '\0';
+		spin_unlock(&rtas_data_buf_lock);
+	} while (rtas_busy_delay(call_status));
 
 	if (call_status != 0) {
 		pr_warn("%s %s Error calling get-system-parameter (0x%x)\n",

-- 
2.39.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 04/19] powerpc/pseries/lparcfg: add missing RTAS retry status handling
  2023-02-06 18:54 ` Nathan Lynch via B4 Submission Endpoint
@ 2023-02-06 18:54   ` Nathan Lynch
  -1 siblings, 0 replies; 50+ messages in thread
From: Nathan Lynch via B4 Submission Endpoint @ 2023-02-06 18:54 UTC (permalink / raw)
  To: Michael Ellerman, Nicholas Piggin, Christophe Leroy, Kajol Jain,
	Laurent Dufour, Mahesh J Salgaonkar, Andrew Donnellan,
	Nick Child
  Cc: Nathan Lynch, linuxppc-dev

From: Nathan Lynch <nathanl@linux.ibm.com>

The ibm,get-system-parameter RTAS function may return -2 or 990x,
which indicate that the caller should try again.

lparcfg's parse_system_parameter_string() ignores this, making it
possible to intermittently report incorrect SPLPAR characteristics.

Move the RTAS call into a coventional rtas_busy_delay()-based loop.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
---
 arch/powerpc/platforms/pseries/lparcfg.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/lparcfg.c b/arch/powerpc/platforms/pseries/lparcfg.c
index 63fd925ccbb8..cd33d5800763 100644
--- a/arch/powerpc/platforms/pseries/lparcfg.c
+++ b/arch/powerpc/platforms/pseries/lparcfg.c
@@ -408,6 +408,7 @@ static void read_lpar_name(struct seq_file *m)
  */
 static void parse_system_parameter_string(struct seq_file *m)
 {
+	const s32 token = rtas_token("ibm,get-system-parameter");
 	int call_status;
 
 	unsigned char *local_buffer = kmalloc(SPLPAR_MAXLENGTH, GFP_KERNEL);
@@ -417,16 +418,15 @@ static void parse_system_parameter_string(struct seq_file *m)
 		return;
 	}
 
-	spin_lock(&rtas_data_buf_lock);
-	memset(rtas_data_buf, 0, SPLPAR_MAXLENGTH);
-	call_status = rtas_call(rtas_token("ibm,get-system-parameter"), 3, 1,
-				NULL,
-				SPLPAR_CHARACTERISTICS_TOKEN,
-				__pa(rtas_data_buf),
-				RTAS_DATA_BUF_SIZE);
-	memcpy(local_buffer, rtas_data_buf, SPLPAR_MAXLENGTH);
-	local_buffer[SPLPAR_MAXLENGTH - 1] = '\0';
-	spin_unlock(&rtas_data_buf_lock);
+	do {
+		spin_lock(&rtas_data_buf_lock);
+		memset(rtas_data_buf, 0, SPLPAR_MAXLENGTH);
+		call_status = rtas_call(token, 3, 1, NULL, SPLPAR_CHARACTERISTICS_TOKEN,
+					__pa(rtas_data_buf), RTAS_DATA_BUF_SIZE);
+		memcpy(local_buffer, rtas_data_buf, SPLPAR_MAXLENGTH);
+		local_buffer[SPLPAR_MAXLENGTH - 1] = '\0';
+		spin_unlock(&rtas_data_buf_lock);
+	} while (rtas_busy_delay(call_status));
 
 	if (call_status != 0) {
 		printk(KERN_INFO

-- 
2.39.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 04/19] powerpc/pseries/lparcfg: add missing RTAS retry status handling
@ 2023-02-06 18:54   ` Nathan Lynch
  0 siblings, 0 replies; 50+ messages in thread
From: Nathan Lynch @ 2023-02-06 18:54 UTC (permalink / raw)
  To: Michael Ellerman, Nicholas Piggin, Christophe Leroy, Kajol Jain,
	Laurent Dufour, Mahesh J Salgaonkar, Andrew Donnellan,
	Nick Child
  Cc: linuxppc-dev, Nathan Lynch

The ibm,get-system-parameter RTAS function may return -2 or 990x,
which indicate that the caller should try again.

lparcfg's parse_system_parameter_string() ignores this, making it
possible to intermittently report incorrect SPLPAR characteristics.

Move the RTAS call into a coventional rtas_busy_delay()-based loop.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
---
 arch/powerpc/platforms/pseries/lparcfg.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/lparcfg.c b/arch/powerpc/platforms/pseries/lparcfg.c
index 63fd925ccbb8..cd33d5800763 100644
--- a/arch/powerpc/platforms/pseries/lparcfg.c
+++ b/arch/powerpc/platforms/pseries/lparcfg.c
@@ -408,6 +408,7 @@ static void read_lpar_name(struct seq_file *m)
  */
 static void parse_system_parameter_string(struct seq_file *m)
 {
+	const s32 token = rtas_token("ibm,get-system-parameter");
 	int call_status;
 
 	unsigned char *local_buffer = kmalloc(SPLPAR_MAXLENGTH, GFP_KERNEL);
@@ -417,16 +418,15 @@ static void parse_system_parameter_string(struct seq_file *m)
 		return;
 	}
 
-	spin_lock(&rtas_data_buf_lock);
-	memset(rtas_data_buf, 0, SPLPAR_MAXLENGTH);
-	call_status = rtas_call(rtas_token("ibm,get-system-parameter"), 3, 1,
-				NULL,
-				SPLPAR_CHARACTERISTICS_TOKEN,
-				__pa(rtas_data_buf),
-				RTAS_DATA_BUF_SIZE);
-	memcpy(local_buffer, rtas_data_buf, SPLPAR_MAXLENGTH);
-	local_buffer[SPLPAR_MAXLENGTH - 1] = '\0';
-	spin_unlock(&rtas_data_buf_lock);
+	do {
+		spin_lock(&rtas_data_buf_lock);
+		memset(rtas_data_buf, 0, SPLPAR_MAXLENGTH);
+		call_status = rtas_call(token, 3, 1, NULL, SPLPAR_CHARACTERISTICS_TOKEN,
+					__pa(rtas_data_buf), RTAS_DATA_BUF_SIZE);
+		memcpy(local_buffer, rtas_data_buf, SPLPAR_MAXLENGTH);
+		local_buffer[SPLPAR_MAXLENGTH - 1] = '\0';
+		spin_unlock(&rtas_data_buf_lock);
+	} while (rtas_busy_delay(call_status));
 
 	if (call_status != 0) {
 		printk(KERN_INFO

-- 
2.39.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 05/19] powerpc/pseries/setup: add missing RTAS retry status handling
  2023-02-06 18:54 ` Nathan Lynch via B4 Submission Endpoint
@ 2023-02-06 18:54   ` Nathan Lynch via B4 Submission Endpoint
  -1 siblings, 0 replies; 50+ messages in thread
From: Nathan Lynch @ 2023-02-06 18:54 UTC (permalink / raw)
  To: Michael Ellerman, Nicholas Piggin, Christophe Leroy, Kajol Jain,
	Laurent Dufour, Mahesh J Salgaonkar, Andrew Donnellan,
	Nick Child
  Cc: linuxppc-dev, Nathan Lynch

The ibm,get-system-parameter RTAS function may return -2 or 990x,
which indicate that the caller should try again.

pSeries_cmo_feature_init() ignores this, making it possible to fail to
detect cooperative memory overcommit capabilities during boot.

Move the RTAS call into a conventional rtas_busy_delay()-based
loop, dropping unnecessary clearing of rtas_data_buf.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
---
 arch/powerpc/platforms/pseries/setup.c | 20 ++++++++++++--------
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c
index 8ef3270515a9..74e50b6b28d4 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -941,21 +941,25 @@ void pSeries_coalesce_init(void)
  */
 static void __init pSeries_cmo_feature_init(void)
 {
+	const s32 token = rtas_token("ibm,get-system-parameter");
 	char *ptr, *key, *value, *end;
 	int call_status;
 	int page_order = IOMMU_PAGE_SHIFT_4K;
 
 	pr_debug(" -> fw_cmo_feature_init()\n");
-	spin_lock(&rtas_data_buf_lock);
-	memset(rtas_data_buf, 0, RTAS_DATA_BUF_SIZE);
-	call_status = rtas_call(rtas_token("ibm,get-system-parameter"), 3, 1,
-				NULL,
-				CMO_CHARACTERISTICS_TOKEN,
-				__pa(rtas_data_buf),
-				RTAS_DATA_BUF_SIZE);
 
-	if (call_status != 0) {
+	do {
+		spin_lock(&rtas_data_buf_lock);
+		call_status = rtas_call(token, 3, 1, NULL,
+					CMO_CHARACTERISTICS_TOKEN,
+					__pa(rtas_data_buf),
+					RTAS_DATA_BUF_SIZE);
+		if (call_status == 0)
+			break;
 		spin_unlock(&rtas_data_buf_lock);
+	} while (rtas_busy_delay(call_status));
+
+	if (call_status != 0) {
 		pr_debug("CMO not available\n");
 		pr_debug(" <- fw_cmo_feature_init()\n");
 		return;

-- 
2.39.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 05/19] powerpc/pseries/setup: add missing RTAS retry status handling
@ 2023-02-06 18:54   ` Nathan Lynch via B4 Submission Endpoint
  0 siblings, 0 replies; 50+ messages in thread
From: Nathan Lynch via B4 Submission Endpoint @ 2023-02-06 18:54 UTC (permalink / raw)
  To: Michael Ellerman, Nicholas Piggin, Christophe Leroy, Kajol Jain,
	Laurent Dufour, Mahesh J Salgaonkar, Andrew Donnellan,
	Nick Child
  Cc: Nathan Lynch, linuxppc-dev

From: Nathan Lynch <nathanl@linux.ibm.com>

The ibm,get-system-parameter RTAS function may return -2 or 990x,
which indicate that the caller should try again.

pSeries_cmo_feature_init() ignores this, making it possible to fail to
detect cooperative memory overcommit capabilities during boot.

Move the RTAS call into a conventional rtas_busy_delay()-based
loop, dropping unnecessary clearing of rtas_data_buf.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
---
 arch/powerpc/platforms/pseries/setup.c | 20 ++++++++++++--------
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c
index 8ef3270515a9..74e50b6b28d4 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -941,21 +941,25 @@ void pSeries_coalesce_init(void)
  */
 static void __init pSeries_cmo_feature_init(void)
 {
+	const s32 token = rtas_token("ibm,get-system-parameter");
 	char *ptr, *key, *value, *end;
 	int call_status;
 	int page_order = IOMMU_PAGE_SHIFT_4K;
 
 	pr_debug(" -> fw_cmo_feature_init()\n");
-	spin_lock(&rtas_data_buf_lock);
-	memset(rtas_data_buf, 0, RTAS_DATA_BUF_SIZE);
-	call_status = rtas_call(rtas_token("ibm,get-system-parameter"), 3, 1,
-				NULL,
-				CMO_CHARACTERISTICS_TOKEN,
-				__pa(rtas_data_buf),
-				RTAS_DATA_BUF_SIZE);
 
-	if (call_status != 0) {
+	do {
+		spin_lock(&rtas_data_buf_lock);
+		call_status = rtas_call(token, 3, 1, NULL,
+					CMO_CHARACTERISTICS_TOKEN,
+					__pa(rtas_data_buf),
+					RTAS_DATA_BUF_SIZE);
+		if (call_status == 0)
+			break;
 		spin_unlock(&rtas_data_buf_lock);
+	} while (rtas_busy_delay(call_status));
+
+	if (call_status != 0) {
 		pr_debug("CMO not available\n");
 		pr_debug(" <- fw_cmo_feature_init()\n");
 		return;

-- 
2.39.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 06/19] powerpc/pseries: drop RTAS-based timebase synchronization
  2023-02-06 18:54 ` Nathan Lynch via B4 Submission Endpoint
@ 2023-02-06 18:54   ` Nathan Lynch via B4 Submission Endpoint
  -1 siblings, 0 replies; 50+ messages in thread
From: Nathan Lynch @ 2023-02-06 18:54 UTC (permalink / raw)
  To: Michael Ellerman, Nicholas Piggin, Christophe Leroy, Kajol Jain,
	Laurent Dufour, Mahesh J Salgaonkar, Andrew Donnellan,
	Nick Child
  Cc: linuxppc-dev, Nathan Lynch

The pseries platform has been LPAR-only for several generations, and
the PAPR spec:

* Guarantees that timebase synchronization is performed by
  the platform ("The timebase registers are synchronized by the
  platform before CPUs are given to the OS" - 7.3.8 SMP Support).

* Completely omits the RTAS freeze-time-base and thaw-time-base RTAS
  functions, which are CHRP artifacts.

This code is effectively unused on currently supported models, so drop
it.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
---
 arch/powerpc/platforms/pseries/smp.c | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/smp.c b/arch/powerpc/platforms/pseries/smp.c
index fd2174edfa1d..2bcfee86ff87 100644
--- a/arch/powerpc/platforms/pseries/smp.c
+++ b/arch/powerpc/platforms/pseries/smp.c
@@ -278,11 +278,5 @@ void __init smp_init_pseries(void)
 		cpumask_clear_cpu(boot_cpuid, of_spin_mask);
 	}
 
-	/* Non-lpar has additional take/give timebase */
-	if (rtas_token("freeze-time-base") != RTAS_UNKNOWN_SERVICE) {
-		smp_ops->give_timebase = rtas_give_timebase;
-		smp_ops->take_timebase = rtas_take_timebase;
-	}
-
 	pr_debug(" <- smp_init_pSeries()\n");
 }

-- 
2.39.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 06/19] powerpc/pseries: drop RTAS-based timebase synchronization
@ 2023-02-06 18:54   ` Nathan Lynch via B4 Submission Endpoint
  0 siblings, 0 replies; 50+ messages in thread
From: Nathan Lynch via B4 Submission Endpoint @ 2023-02-06 18:54 UTC (permalink / raw)
  To: Michael Ellerman, Nicholas Piggin, Christophe Leroy, Kajol Jain,
	Laurent Dufour, Mahesh J Salgaonkar, Andrew Donnellan,
	Nick Child
  Cc: Nathan Lynch, linuxppc-dev

From: Nathan Lynch <nathanl@linux.ibm.com>

The pseries platform has been LPAR-only for several generations, and
the PAPR spec:

* Guarantees that timebase synchronization is performed by
  the platform ("The timebase registers are synchronized by the
  platform before CPUs are given to the OS" - 7.3.8 SMP Support).

* Completely omits the RTAS freeze-time-base and thaw-time-base RTAS
  functions, which are CHRP artifacts.

This code is effectively unused on currently supported models, so drop
it.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
---
 arch/powerpc/platforms/pseries/smp.c | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/smp.c b/arch/powerpc/platforms/pseries/smp.c
index fd2174edfa1d..2bcfee86ff87 100644
--- a/arch/powerpc/platforms/pseries/smp.c
+++ b/arch/powerpc/platforms/pseries/smp.c
@@ -278,11 +278,5 @@ void __init smp_init_pseries(void)
 		cpumask_clear_cpu(boot_cpuid, of_spin_mask);
 	}
 
-	/* Non-lpar has additional take/give timebase */
-	if (rtas_token("freeze-time-base") != RTAS_UNKNOWN_SERVICE) {
-		smp_ops->give_timebase = rtas_give_timebase;
-		smp_ops->take_timebase = rtas_take_timebase;
-	}
-
 	pr_debug(" <- smp_init_pSeries()\n");
 }

-- 
2.39.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 07/19] powerpc/rtas: improve function information lookups
  2023-02-06 18:54 ` Nathan Lynch via B4 Submission Endpoint
@ 2023-02-06 18:54   ` Nathan Lynch
  -1 siblings, 0 replies; 50+ messages in thread
From: Nathan Lynch via B4 Submission Endpoint @ 2023-02-06 18:54 UTC (permalink / raw)
  To: Michael Ellerman, Nicholas Piggin, Christophe Leroy, Kajol Jain,
	Laurent Dufour, Mahesh J Salgaonkar, Andrew Donnellan,
	Nick Child
  Cc: Nathan Lynch, linuxppc-dev

From: Nathan Lynch <nathanl@linux.ibm.com>

The core RTAS support code and its clients perform two types of lookup
for RTAS firmware function information.

First, mapping a known function name to a token. The typical use case
invokes rtas_token() to retrieve the token value to pass to
rtas_call(). rtas_token() relies on of_get_property(), which performs
a linear search of the /rtas node's property list under a lock with
IRQs disabled.

Second, and less common: given a token value, looking up some
information about the function. The primary example is the sys_rtas
filter path, which linearly scans a small table to match the token to
a rtas_filter struct. Another use case to come is RTAS entry/exit
tracepoints, which will require efficient lookup of function names
from token values. Currently there is no general API for this.

We need something much like the existing rtas_filters table, but more
general and organized to facilitate efficient lookups.

Introduce:

* A new rtas_function type, aggregating function name, token,
  and filter. Other function characteristics could be added in the
  future.

* An array of rtas_function, where each element corresponds to a known
  RTAS function. All information in the table is static save the token
  values, which are derived from the device tree at boot. The array is
  sorted by function name to allow binary search.

* A named constant for each known RTAS function, used to index the
  function array. These also will be used in a client-facing API to be
  added later.

* An xarray that maps valid tokens to rtas_function objects.

Fold the existing rtas_filter table into the new rtas_function array,
with the appropriate adjustments to block_rtas_call(). Remove
now-redundant fields from struct rtas_filter. Preserve the function of
the CONFIG_CPU_BIG_ENDIAN guard in the current filter table by
introducing a per-function flag that is set for the function entries
related to pseries LPAR migration. These have never had working users
via sys_rtas on ppc64le; see commit de0f7349a0dd ("powerpc/rtas:
prevent suspend-related sys_rtas use on LE").

Convert rtas_token() to use a lockless binary search on the function
table. Fall back to the old behavior for lookups against names that
are not known to be RTAS functions, but issue a warning. rtas_token()
is for function names; it is not a general facility for accessing
arbitrary properties of the /rtas node. All known misuses of
rtas_token() have been converted to more appropriate of_ APIs in
preceding changes.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
---
 arch/powerpc/include/asm/rtas.h |  87 +++++
 arch/powerpc/kernel/rtas.c      | 735 ++++++++++++++++++++++++++++++++++------
 2 files changed, 709 insertions(+), 113 deletions(-)

diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
index 479a95cb2770..14fe79217c26 100644
--- a/arch/powerpc/include/asm/rtas.h
+++ b/arch/powerpc/include/asm/rtas.h
@@ -16,6 +16,93 @@
  * Copyright (C) 2001 PPC 64 Team, IBM Corp
  */
 
+#define rtas_fnidx(x_) RTAS_FNIDX__ ## x_
+
+enum rtas_function_index {
+	rtas_fnidx(CHECK_EXCEPTION),
+	rtas_fnidx(DISPLAY_CHARACTER),
+	rtas_fnidx(EVENT_SCAN),
+	rtas_fnidx(FREEZE_TIME_BASE),
+	rtas_fnidx(GET_POWER_LEVEL),
+	rtas_fnidx(GET_SENSOR_STATE),
+	rtas_fnidx(GET_TERM_CHAR),
+	rtas_fnidx(GET_TIME_OF_DAY),
+	rtas_fnidx(IBM_ACTIVATE_FIRMWARE),
+	rtas_fnidx(IBM_CBE_START_PTCAL),
+	rtas_fnidx(IBM_CBE_STOP_PTCAL),
+	rtas_fnidx(IBM_CHANGE_MSI),
+	rtas_fnidx(IBM_CLOSE_ERRINJCT),
+	rtas_fnidx(IBM_CONFIGURE_BRIDGE),
+	rtas_fnidx(IBM_CONFIGURE_CONNECTOR),
+	rtas_fnidx(IBM_CONFIGURE_KERNEL_DUMP),
+	rtas_fnidx(IBM_CONFIGURE_PE),
+	rtas_fnidx(IBM_CREATE_PE_DMA_WINDOW),
+	rtas_fnidx(IBM_DISPLAY_MESSAGE),
+	rtas_fnidx(IBM_ERRINJCT),
+	rtas_fnidx(IBM_EXTI2C),
+	rtas_fnidx(IBM_GET_CONFIG_ADDR_INFO),
+	rtas_fnidx(IBM_GET_CONFIG_ADDR_INFO2),
+	rtas_fnidx(IBM_GET_DYNAMIC_SENSOR_STATE),
+	rtas_fnidx(IBM_GET_INDICES),
+	rtas_fnidx(IBM_GET_RIO_TOPOLOGY),
+	rtas_fnidx(IBM_GET_SYSTEM_PARAMETER),
+	rtas_fnidx(IBM_GET_VPD),
+	rtas_fnidx(IBM_GET_XIVE),
+	rtas_fnidx(IBM_INT_OFF),
+	rtas_fnidx(IBM_INT_ON),
+	rtas_fnidx(IBM_IO_QUIESCE_ACK),
+	rtas_fnidx(IBM_LPAR_PERFTOOLS),
+	rtas_fnidx(IBM_MANAGE_FLASH_IMAGE),
+	rtas_fnidx(IBM_MANAGE_STORAGE_PRESERVATION),
+	rtas_fnidx(IBM_NMI_INTERLOCK),
+	rtas_fnidx(IBM_NMI_REGISTER),
+	rtas_fnidx(IBM_OPEN_ERRINJCT),
+	rtas_fnidx(IBM_OPEN_SRIOV_ALLOW_UNFREEZE),
+	rtas_fnidx(IBM_OPEN_SRIOV_MAP_PE_NUMBER),
+	rtas_fnidx(IBM_OS_TERM),
+	rtas_fnidx(IBM_PARTNER_CONTROL),
+	rtas_fnidx(IBM_PHYSICAL_ATTESTATION),
+	rtas_fnidx(IBM_PLATFORM_DUMP),
+	rtas_fnidx(IBM_POWER_OFF_UPS),
+	rtas_fnidx(IBM_QUERY_INTERRUPT_SOURCE_NUMBER),
+	rtas_fnidx(IBM_QUERY_PE_DMA_WINDOW),
+	rtas_fnidx(IBM_READ_PCI_CONFIG),
+	rtas_fnidx(IBM_READ_SLOT_RESET_STATE),
+	rtas_fnidx(IBM_READ_SLOT_RESET_STATE2),
+	rtas_fnidx(IBM_REMOVE_PE_DMA_WINDOW),
+	rtas_fnidx(IBM_RESET_PE_DMA_WINDOWS),
+	rtas_fnidx(IBM_SCAN_LOG_DUMP),
+	rtas_fnidx(IBM_SET_DYNAMIC_INDICATOR),
+	rtas_fnidx(IBM_SET_EEH_OPTION),
+	rtas_fnidx(IBM_SET_SLOT_RESET),
+	rtas_fnidx(IBM_SET_SYSTEM_PARAMETER),
+	rtas_fnidx(IBM_SET_XIVE),
+	rtas_fnidx(IBM_SLOT_ERROR_DETAIL),
+	rtas_fnidx(IBM_SUSPEND_ME),
+	rtas_fnidx(IBM_TUNE_DMA_PARMS),
+	rtas_fnidx(IBM_UPDATE_FLASH_64_AND_REBOOT),
+	rtas_fnidx(IBM_UPDATE_NODES),
+	rtas_fnidx(IBM_UPDATE_PROPERTIES),
+	rtas_fnidx(IBM_VALIDATE_FLASH_IMAGE),
+	rtas_fnidx(IBM_WRITE_PCI_CONFIG),
+	rtas_fnidx(NVRAM_FETCH),
+	rtas_fnidx(NVRAM_STORE),
+	rtas_fnidx(POWER_OFF),
+	rtas_fnidx(PUT_TERM_CHAR),
+	rtas_fnidx(QUERY_CPU_STOPPED_STATE),
+	rtas_fnidx(READ_PCI_CONFIG),
+	rtas_fnidx(RTAS_LAST_ERROR),
+	rtas_fnidx(SET_INDICATOR),
+	rtas_fnidx(SET_POWER_LEVEL),
+	rtas_fnidx(SET_TIME_FOR_POWER_ON),
+	rtas_fnidx(SET_TIME_OF_DAY),
+	rtas_fnidx(START_CPU),
+	rtas_fnidx(STOP_SELF),
+	rtas_fnidx(SYSTEM_REBOOT),
+	rtas_fnidx(THAW_TIME_BASE),
+	rtas_fnidx(WRITE_PCI_CONFIG),
+};
+
 #define RTAS_UNKNOWN_SERVICE (-1)
 #define RTAS_INSTANTIATE_MAX (1ULL<<30) /* Don't instantiate rtas at/above this value */
 
diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index ec2df09a70cf..2804382c74b1 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -9,10 +9,12 @@
 
 #define pr_fmt(fmt)	"rtas: " fmt
 
+#include <linux/bsearch.h>
 #include <linux/capability.h>
 #include <linux/delay.h>
 #include <linux/export.h>
 #include <linux/init.h>
+#include <linux/kconfig.h>
 #include <linux/kernel.h>
 #include <linux/memblock.h>
 #include <linux/of.h>
@@ -26,6 +28,7 @@
 #include <linux/syscalls.h>
 #include <linux/types.h>
 #include <linux/uaccess.h>
+#include <linux/xarray.h>
 
 #include <asm/delay.h>
 #include <asm/firmware.h>
@@ -37,6 +40,486 @@
 #include <asm/time.h>
 #include <asm/udbg.h>
 
+struct rtas_filter {
+	/* Indexes into the args buffer, -1 if not used */
+	const int buf_idx1;
+	const int size_idx1;
+	const int buf_idx2;
+	const int size_idx2;
+	/*
+	 * Assumed buffer size per the spec if the function does not
+	 * have a size parameter, e.g. ibm,errinjct. 0 if unused.
+	 */
+	const int fixed_size;
+};
+
+/**
+ * struct rtas_function - Descriptor for RTAS functions.
+ *
+ * @token: Value of @name if it exists under the /rtas node.
+ * @name: Function name.
+ * @filter: If non-NULL, invoking this function via the rtas syscall is
+ *          generally allowed, and @filter describes constraints on the
+ *          arguments. See also @banned_for_syscall_on_le.
+ * @banned_for_syscall_on_le: Set when call via sys_rtas is generally allowed
+ *                            but specifically restricted on ppc64le. Such
+ *                            functions are believed to have no users on
+ *                            ppc64le, and we want to keep it that way. It does
+ *                            not make sense for this to be set when @filter
+ *                            is false.
+ */
+struct rtas_function {
+	s32 token;
+	const bool banned_for_syscall_on_le:1;
+	const char * const name;
+	const struct rtas_filter *filter;
+};
+
+static struct rtas_function rtas_function_table[] __ro_after_init = {
+	[rtas_fnidx(CHECK_EXCEPTION)] = {
+		.name = "check-exception",
+	},
+	[rtas_fnidx(DISPLAY_CHARACTER)] = {
+		.name = "display-character",
+		.filter = &(const struct rtas_filter) {
+			.buf_idx1 = -1, .size_idx1 = -1,
+			.buf_idx2 = -1, .size_idx2 = -1,
+		},
+	},
+	[rtas_fnidx(EVENT_SCAN)] = {
+		.name = "event-scan",
+	},
+	[rtas_fnidx(FREEZE_TIME_BASE)] = {
+		.name = "freeze-time-base",
+	},
+	[rtas_fnidx(GET_POWER_LEVEL)] = {
+		.name = "get-power-level",
+		.filter = &(const struct rtas_filter) {
+			.buf_idx1 = -1, .size_idx1 = -1,
+			.buf_idx2 = -1, .size_idx2 = -1,
+		},
+	},
+	[rtas_fnidx(GET_SENSOR_STATE)] = {
+		.name = "get-sensor-state",
+		.filter = &(const struct rtas_filter) {
+			.buf_idx1 = -1, .size_idx1 = -1,
+			.buf_idx2 = -1, .size_idx2 = -1,
+		},
+	},
+	[rtas_fnidx(GET_TERM_CHAR)] = {
+		.name = "get-term-char",
+	},
+	[rtas_fnidx(GET_TIME_OF_DAY)] = {
+		.name = "get-time-of-day",
+		.filter = &(const struct rtas_filter) {
+			.buf_idx1 = -1, .size_idx1 = -1,
+			.buf_idx2 = -1, .size_idx2 = -1,
+		},
+	},
+	[rtas_fnidx(IBM_ACTIVATE_FIRMWARE)] = {
+		.name = "ibm,activate-firmware",
+		.filter = &(const struct rtas_filter) {
+			.buf_idx1 = -1, .size_idx1 = -1,
+			.buf_idx2 = -1, .size_idx2 = -1,
+		},
+	},
+	[rtas_fnidx(IBM_CBE_START_PTCAL)] = {
+		.name = "ibm,cbe-start-ptcal",
+	},
+	[rtas_fnidx(IBM_CBE_STOP_PTCAL)] = {
+		.name = "ibm,cbe-stop-ptcal",
+	},
+	[rtas_fnidx(IBM_CHANGE_MSI)] = {
+		.name = "ibm,change-msi",
+	},
+	[rtas_fnidx(IBM_CLOSE_ERRINJCT)] = {
+		.name = "ibm,close-errinjct",
+		.filter = &(const struct rtas_filter) {
+			.buf_idx1 = -1, .size_idx1 = -1,
+			.buf_idx2 = -1, .size_idx2 = -1,
+		},
+	},
+	[rtas_fnidx(IBM_CONFIGURE_BRIDGE)] = {
+		.name = "ibm,configure-bridge",
+	},
+	[rtas_fnidx(IBM_CONFIGURE_CONNECTOR)] = {
+		.name = "ibm,configure-connector",
+		.filter = &(const struct rtas_filter) {
+			.buf_idx1 = 0, .size_idx1 = -1,
+			.buf_idx2 = 1, .size_idx2 = -1,
+			.fixed_size = 4096,
+		},
+	},
+	[rtas_fnidx(IBM_CONFIGURE_KERNEL_DUMP)] = {
+		.name = "ibm,configure-kernel-dump",
+	},
+	[rtas_fnidx(IBM_CONFIGURE_PE)] = {
+		.name = "ibm,configure-pe",
+	},
+	[rtas_fnidx(IBM_CREATE_PE_DMA_WINDOW)] = {
+		.name = "ibm,create-pe-dma-window",
+	},
+	[rtas_fnidx(IBM_DISPLAY_MESSAGE)] = {
+		.name = "ibm,display-message",
+		.filter = &(const struct rtas_filter) {
+			.buf_idx1 = 0, .size_idx1 = -1,
+			.buf_idx2 = -1, .size_idx2 = -1,
+		},
+	},
+	[rtas_fnidx(IBM_ERRINJCT)] = {
+		.name = "ibm,errinjct",
+		.filter = &(const struct rtas_filter) {
+			.buf_idx1 = 2, .size_idx1 = -1,
+			.buf_idx2 = -1, .size_idx2 = -1,
+			.fixed_size = 1024,
+		},
+	},
+	[rtas_fnidx(IBM_EXTI2C)] = {
+		.name = "ibm,exti2c",
+	},
+	[rtas_fnidx(IBM_GET_CONFIG_ADDR_INFO)] = {
+		.name = "ibm,get-config-addr-info",
+	},
+	[rtas_fnidx(IBM_GET_CONFIG_ADDR_INFO2)] = {
+		.name = "ibm,get-config-addr-info2",
+		.filter = &(const struct rtas_filter) {
+			.buf_idx1 = -1, .size_idx1 = -1,
+			.buf_idx2 = -1, .size_idx2 = -1,
+		},
+	},
+	[rtas_fnidx(IBM_GET_DYNAMIC_SENSOR_STATE)] = {
+		.name = "ibm,get-dynamic-sensor-state",
+		.filter = &(const struct rtas_filter) {
+			.buf_idx1 = 1, .size_idx1 = -1,
+			.buf_idx2 = -1, .size_idx2 = -1,
+		},
+	},
+	[rtas_fnidx(IBM_GET_INDICES)] = {
+		.name = "ibm,get-indices",
+		.filter = &(const struct rtas_filter) {
+			.buf_idx1 = 2, .size_idx1 = 3,
+			.buf_idx2 = -1, .size_idx2 = -1,
+		},
+	},
+	[rtas_fnidx(IBM_GET_RIO_TOPOLOGY)] = {
+		.name = "ibm,get-rio-topology",
+	},
+	[rtas_fnidx(IBM_GET_SYSTEM_PARAMETER)] = {
+		.name = "ibm,get-system-parameter",
+		.filter = &(const struct rtas_filter) {
+			.buf_idx1 = 1, .size_idx1 = 2,
+			.buf_idx2 = -1, .size_idx2 = -1,
+		},
+	},
+	[rtas_fnidx(IBM_GET_VPD)] = {
+		.name = "ibm,get-vpd",
+		.filter = &(const struct rtas_filter) {
+			.buf_idx1 = 0, .size_idx1 = -1,
+			.buf_idx2 = 1, .size_idx2 = 2,
+		},
+	},
+	[rtas_fnidx(IBM_GET_XIVE)] = {
+		.name = "ibm,get-xive",
+	},
+	[rtas_fnidx(IBM_INT_OFF)] = {
+		.name = "ibm,int-off",
+	},
+	[rtas_fnidx(IBM_INT_ON)] = {
+		.name = "ibm,int-on",
+	},
+	[rtas_fnidx(IBM_IO_QUIESCE_ACK)] = {
+		.name = "ibm,io-quiesce-ack",
+	},
+	[rtas_fnidx(IBM_LPAR_PERFTOOLS)] = {
+		.name = "ibm,lpar-perftools",
+		.filter = &(const struct rtas_filter) {
+			.buf_idx1 = 2, .size_idx1 = 3,
+			.buf_idx2 = -1, .size_idx2 = -1,
+		},
+	},
+	[rtas_fnidx(IBM_MANAGE_FLASH_IMAGE)] = {
+		.name = "ibm,manage-flash-image",
+	},
+	[rtas_fnidx(IBM_MANAGE_STORAGE_PRESERVATION)] = {
+		.name = "ibm,manage-storage-preservation",
+	},
+	[rtas_fnidx(IBM_NMI_INTERLOCK)] = {
+		.name = "ibm,nmi-interlock",
+	},
+	[rtas_fnidx(IBM_NMI_REGISTER)] = {
+		.name = "ibm,nmi-register",
+	},
+	[rtas_fnidx(IBM_OPEN_ERRINJCT)] = {
+		.name = "ibm,open-errinjct",
+		.filter = &(const struct rtas_filter) {
+			.buf_idx1 = -1, .size_idx1 = -1,
+			.buf_idx2 = -1, .size_idx2 = -1,
+		},
+	},
+	[rtas_fnidx(IBM_OPEN_SRIOV_ALLOW_UNFREEZE)] = {
+		.name = "ibm,open-sriov-allow-unfreeze",
+	},
+	[rtas_fnidx(IBM_OPEN_SRIOV_MAP_PE_NUMBER)] = {
+		.name = "ibm,open-sriov-map-pe-number",
+	},
+	[rtas_fnidx(IBM_OS_TERM)] = {
+		.name = "ibm,os-term",
+	},
+	[rtas_fnidx(IBM_PARTNER_CONTROL)] = {
+		.name = "ibm,partner-control",
+	},
+	[rtas_fnidx(IBM_PHYSICAL_ATTESTATION)] = {
+		.name = "ibm,physical-attestation",
+		.filter = &(const struct rtas_filter) {
+			.buf_idx1 = 0, .size_idx1 = 1,
+			.buf_idx2 = -1, .size_idx2 = -1,
+		},
+	},
+	[rtas_fnidx(IBM_PLATFORM_DUMP)] = {
+		.name = "ibm,platform-dump",
+		.filter = &(const struct rtas_filter) {
+			.buf_idx1 = 4, .size_idx1 = 5,
+			.buf_idx2 = -1, .size_idx2 = -1,
+		},
+	},
+	[rtas_fnidx(IBM_POWER_OFF_UPS)] = {
+		.name = "ibm,power-off-ups",
+	},
+	[rtas_fnidx(IBM_QUERY_INTERRUPT_SOURCE_NUMBER)] = {
+		.name = "ibm,query-interrupt-source-number",
+	},
+	[rtas_fnidx(IBM_QUERY_PE_DMA_WINDOW)] = {
+		.name = "ibm,query-pe-dma-window",
+	},
+	[rtas_fnidx(IBM_READ_PCI_CONFIG)] = {
+		.name = "ibm,read-pci-config",
+	},
+	[rtas_fnidx(IBM_READ_SLOT_RESET_STATE)] = {
+		.name = "ibm,read-slot-reset-state",
+		.filter = &(const struct rtas_filter) {
+			.buf_idx1 = -1, .size_idx1 = -1,
+			.buf_idx2 = -1, .size_idx2 = -1,
+		},
+	},
+	[rtas_fnidx(IBM_READ_SLOT_RESET_STATE2)] = {
+		.name = "ibm,read-slot-reset-state2",
+	},
+	[rtas_fnidx(IBM_REMOVE_PE_DMA_WINDOW)] = {
+		.name = "ibm,remove-pe-dma-window",
+	},
+	[rtas_fnidx(IBM_RESET_PE_DMA_WINDOWS)] = {
+		.name = "ibm,reset-pe-dma-windows",
+	},
+	[rtas_fnidx(IBM_SCAN_LOG_DUMP)] = {
+		.name = "ibm,scan-log-dump",
+		.filter = &(const struct rtas_filter) {
+			.buf_idx1 = 0, .size_idx1 = 1,
+			.buf_idx2 = -1, .size_idx2 = -1,
+		},
+	},
+	[rtas_fnidx(IBM_SET_DYNAMIC_INDICATOR)] = {
+		.name = "ibm,set-dynamic-indicator",
+		.filter = &(const struct rtas_filter) {
+			.buf_idx1 = 2, .size_idx1 = -1,
+			.buf_idx2 = -1, .size_idx2 = -1,
+		},
+	},
+	[rtas_fnidx(IBM_SET_EEH_OPTION)] = {
+		.name = "ibm,set-eeh-option",
+		.filter = &(const struct rtas_filter) {
+			.buf_idx1 = -1, .size_idx1 = -1,
+			.buf_idx2 = -1, .size_idx2 = -1,
+		},
+	},
+	[rtas_fnidx(IBM_SET_SLOT_RESET)] = {
+		.name = "ibm,set-slot-reset",
+	},
+	[rtas_fnidx(IBM_SET_SYSTEM_PARAMETER)] = {
+		.name = "ibm,set-system-parameter",
+		.filter = &(const struct rtas_filter) {
+			.buf_idx1 = 1, .size_idx1 = -1,
+			.buf_idx2 = -1, .size_idx2 = -1,
+		},
+	},
+	[rtas_fnidx(IBM_SET_XIVE)] = {
+		.name = "ibm,set-xive",
+	},
+	[rtas_fnidx(IBM_SLOT_ERROR_DETAIL)] = {
+		.name = "ibm,slot-error-detail",
+	},
+	[rtas_fnidx(IBM_SUSPEND_ME)] = {
+		.name = "ibm,suspend-me",
+		.banned_for_syscall_on_le = true,
+		.filter = &(const struct rtas_filter) {
+			.buf_idx1 = -1, .size_idx1 = -1,
+			.buf_idx2 = -1, .size_idx2 = -1,
+		},
+	},
+	[rtas_fnidx(IBM_TUNE_DMA_PARMS)] = {
+		.name = "ibm,tune-dma-parms",
+	},
+	[rtas_fnidx(IBM_UPDATE_FLASH_64_AND_REBOOT)] = {
+		.name = "ibm,update-flash-64-and-reboot",
+	},
+	[rtas_fnidx(IBM_UPDATE_NODES)] = {
+		.name = "ibm,update-nodes",
+		.banned_for_syscall_on_le = true,
+		.filter = &(const struct rtas_filter) {
+			.buf_idx1 = 0, .size_idx1 = -1,
+			.buf_idx2 = -1, .size_idx2 = -1,
+			.fixed_size = 4096,
+		},
+	},
+	[rtas_fnidx(IBM_UPDATE_PROPERTIES)] = {
+		.name = "ibm,update-properties",
+		.banned_for_syscall_on_le = true,
+		.filter = &(const struct rtas_filter) {
+			.buf_idx1 = 0, .size_idx1 = -1,
+			.buf_idx2 = -1, .size_idx2 = -1,
+			.fixed_size = 4096,
+		},
+	},
+	[rtas_fnidx(IBM_VALIDATE_FLASH_IMAGE)] = {
+		.name = "ibm,validate-flash-image",
+	},
+	[rtas_fnidx(IBM_WRITE_PCI_CONFIG)] = {
+		.name = "ibm,write-pci-config",
+	},
+	[rtas_fnidx(NVRAM_FETCH)] = {
+		.name = "nvram-fetch",
+	},
+	[rtas_fnidx(NVRAM_STORE)] = {
+		.name = "nvram-store",
+	},
+	[rtas_fnidx(POWER_OFF)] = {
+		.name = "power-off",
+	},
+	[rtas_fnidx(PUT_TERM_CHAR)] = {
+		.name = "put-term-char",
+	},
+	[rtas_fnidx(QUERY_CPU_STOPPED_STATE)] = {
+		.name = "query-cpu-stopped-state",
+	},
+	[rtas_fnidx(READ_PCI_CONFIG)] = {
+		.name = "read-pci-config",
+	},
+	[rtas_fnidx(RTAS_LAST_ERROR)] = {
+		.name = "rtas-last-error",
+	},
+	[rtas_fnidx(SET_INDICATOR)] = {
+		.name = "set-indicator",
+		.filter = &(const struct rtas_filter) {
+			.buf_idx1 = -1, .size_idx1 = -1,
+			.buf_idx2 = -1, .size_idx2 = -1,
+		},
+	},
+	[rtas_fnidx(SET_POWER_LEVEL)] = {
+		.name = "set-power-level",
+		.filter = &(const struct rtas_filter) {
+			.buf_idx1 = -1, .size_idx1 = -1,
+			.buf_idx2 = -1, .size_idx2 = -1,
+		},
+	},
+	[rtas_fnidx(SET_TIME_FOR_POWER_ON)] = {
+		.name = "set-time-for-power-on",
+		.filter = &(const struct rtas_filter) {
+			.buf_idx1 = -1, .size_idx1 = -1,
+			.buf_idx2 = -1, .size_idx2 = -1,
+		},
+	},
+	[rtas_fnidx(SET_TIME_OF_DAY)] = {
+		.name = "set-time-of-day",
+		.filter = &(const struct rtas_filter) {
+			.buf_idx1 = -1, .size_idx1 = -1,
+			.buf_idx2 = -1, .size_idx2 = -1,
+		},
+	},
+	[rtas_fnidx(START_CPU)] = {
+		.name = "start-cpu",
+	},
+	[rtas_fnidx(STOP_SELF)] = {
+		.name = "stop-self",
+	},
+	[rtas_fnidx(SYSTEM_REBOOT)] = {
+		.name = "system-reboot",
+	},
+	[rtas_fnidx(THAW_TIME_BASE)] = {
+		.name = "thaw-time-base",
+	},
+	[rtas_fnidx(WRITE_PCI_CONFIG)] = {
+		.name = "write-pci-config",
+	},
+};
+
+static int rtas_function_cmp(const void *a, const void *b)
+{
+	const struct rtas_function *f1 = a;
+	const struct rtas_function *f2 = b;
+
+	return strcmp(f1->name, f2->name);
+}
+
+/*
+ * Boot-time initialization of the function table needs the lookup to
+ * return a non-const-qualified object. Use rtas_name_to_function()
+ * in all other contexts.
+ */
+static struct rtas_function *__rtas_name_to_function(const char *name)
+{
+	const struct rtas_function key = {
+		.name = name,
+	};
+	struct rtas_function *found;
+
+	found = bsearch(&key, rtas_function_table, ARRAY_SIZE(rtas_function_table),
+			sizeof(rtas_function_table[0]), rtas_function_cmp);
+
+	return found;
+}
+
+static const struct rtas_function *rtas_name_to_function(const char *name)
+{
+	return __rtas_name_to_function(name);
+}
+
+static DEFINE_XARRAY(rtas_token_to_function_xarray);
+
+static int __init rtas_token_to_function_xarray_init(void)
+{
+	int err = 0;
+
+	for (size_t i = 0; i < ARRAY_SIZE(rtas_function_table); ++i) {
+		const struct rtas_function *func = &rtas_function_table[i];
+		const s32 token = func->token;
+
+		if (token == RTAS_UNKNOWN_SERVICE)
+			continue;
+
+		err = xa_err(xa_store(&rtas_token_to_function_xarray,
+				      token, (void *)func, GFP_KERNEL));
+		if (err)
+			break;
+	}
+
+	return err;
+}
+arch_initcall(rtas_token_to_function_xarray_init);
+
+static const struct rtas_function *rtas_token_to_function(s32 token)
+{
+	const struct rtas_function *func;
+
+	if (WARN_ONCE(token < 0, "invalid token %d", token))
+		return NULL;
+
+	func = xa_load(&rtas_token_to_function_xarray, token);
+
+	if (WARN_ONCE(!func, "unexpected failed lookup for token %d", token))
+		return NULL;
+
+	return func;
+}
+
 /* This is here deliberately so it's only used in this file */
 void enter_rtas(unsigned long);
 
@@ -315,9 +798,25 @@ EXPORT_SYMBOL_GPL(rtas_progress);		/* needed by rtas_flash module */
 
 int rtas_token(const char *service)
 {
+	const struct rtas_function *func;
 	const __be32 *tokp;
+
 	if (rtas.dev == NULL)
 		return RTAS_UNKNOWN_SERVICE;
+
+	func = rtas_name_to_function(service);
+	if (func)
+		return func->token;
+	/*
+	 * The caller is looking up a name that is not known to be an
+	 * RTAS function. Either it's a function that needs to be
+	 * added to the table, or they're misusing rtas_token() to
+	 * access non-function properties of the /rtas node. Warn and
+	 * fall back to the legacy behavior.
+	 */
+	WARN_ONCE(1, "unknown function `%s`, should it be added to rtas_function_table?\n",
+		  service);
+
 	tokp = of_get_property(rtas.dev, service, NULL);
 	return tokp ? be32_to_cpu(*tokp) : RTAS_UNKNOWN_SERVICE;
 }
@@ -1089,56 +1588,12 @@ noinstr struct pseries_errorlog *get_pseries_errorlog(struct rtas_error_log *log
  *
  * Accordingly, we filter RTAS requests to check that the call is
  * permitted, and that provided pointers fall within the RMO buffer.
- * The rtas_filters list contains an entry for each permitted call,
- * with the indexes of the parameters which are expected to contain
- * addresses and sizes of buffers allocated inside the RMO buffer.
+ * If a function is allowed to be invoked via the syscall, then its
+ * entry in the rtas_functions table points to a rtas_filter that
+ * describes its constraints, with the indexes of the parameters which
+ * are expected to contain addresses and sizes of buffers allocated
+ * inside the RMO buffer.
  */
-struct rtas_filter {
-	const char *name;
-	int token;
-	/* Indexes into the args buffer, -1 if not used */
-	int buf_idx1;
-	int size_idx1;
-	int buf_idx2;
-	int size_idx2;
-
-	int fixed_size;
-};
-
-static struct rtas_filter rtas_filters[] __ro_after_init = {
-	{ "ibm,activate-firmware", -1, -1, -1, -1, -1 },
-	{ "ibm,configure-connector", -1, 0, -1, 1, -1, 4096 },	/* Special cased */
-	{ "display-character", -1, -1, -1, -1, -1 },
-	{ "ibm,display-message", -1, 0, -1, -1, -1 },
-	{ "ibm,errinjct", -1, 2, -1, -1, -1, 1024 },
-	{ "ibm,close-errinjct", -1, -1, -1, -1, -1 },
-	{ "ibm,open-errinjct", -1, -1, -1, -1, -1 },
-	{ "ibm,get-config-addr-info2", -1, -1, -1, -1, -1 },
-	{ "ibm,get-dynamic-sensor-state", -1, 1, -1, -1, -1 },
-	{ "ibm,get-indices", -1, 2, 3, -1, -1 },
-	{ "get-power-level", -1, -1, -1, -1, -1 },
-	{ "get-sensor-state", -1, -1, -1, -1, -1 },
-	{ "ibm,get-system-parameter", -1, 1, 2, -1, -1 },
-	{ "get-time-of-day", -1, -1, -1, -1, -1 },
-	{ "ibm,get-vpd", -1, 0, -1, 1, 2 },
-	{ "ibm,lpar-perftools", -1, 2, 3, -1, -1 },
-	{ "ibm,platform-dump", -1, 4, 5, -1, -1 },		/* Special cased */
-	{ "ibm,read-slot-reset-state", -1, -1, -1, -1, -1 },
-	{ "ibm,scan-log-dump", -1, 0, 1, -1, -1 },
-	{ "ibm,set-dynamic-indicator", -1, 2, -1, -1, -1 },
-	{ "ibm,set-eeh-option", -1, -1, -1, -1, -1 },
-	{ "set-indicator", -1, -1, -1, -1, -1 },
-	{ "set-power-level", -1, -1, -1, -1, -1 },
-	{ "set-time-for-power-on", -1, -1, -1, -1, -1 },
-	{ "ibm,set-system-parameter", -1, 1, -1, -1, -1 },
-	{ "set-time-of-day", -1, -1, -1, -1, -1 },
-#ifdef CONFIG_CPU_BIG_ENDIAN
-	{ "ibm,suspend-me", -1, -1, -1, -1, -1 },
-	{ "ibm,update-nodes", -1, 0, -1, -1, -1, 4096 },
-	{ "ibm,update-properties", -1, 0, -1, -1, -1, 4096 },
-#endif
-	{ "ibm,physical-attestation", -1, 0, 1, -1, -1 },
-};
 
 static bool in_rmo_buf(u32 base, u32 end)
 {
@@ -1152,63 +1607,75 @@ static bool in_rmo_buf(u32 base, u32 end)
 static bool block_rtas_call(int token, int nargs,
 			    struct rtas_args *args)
 {
-	int i;
-
-	for (i = 0; i < ARRAY_SIZE(rtas_filters); i++) {
-		struct rtas_filter *f = &rtas_filters[i];
-		u32 base, size, end;
-
-		if (token != f->token)
-			continue;
-
-		if (f->buf_idx1 != -1) {
-			base = be32_to_cpu(args->args[f->buf_idx1]);
-			if (f->size_idx1 != -1)
-				size = be32_to_cpu(args->args[f->size_idx1]);
-			else if (f->fixed_size)
-				size = f->fixed_size;
-			else
-				size = 1;
-
-			end = base + size - 1;
-
-			/*
-			 * Special case for ibm,platform-dump - NULL buffer
-			 * address is used to indicate end of dump processing
-			 */
-			if (!strcmp(f->name, "ibm,platform-dump") &&
-			    base == 0)
-				return false;
-
-			if (!in_rmo_buf(base, end))
-				goto err;
-		}
-
-		if (f->buf_idx2 != -1) {
-			base = be32_to_cpu(args->args[f->buf_idx2]);
-			if (f->size_idx2 != -1)
-				size = be32_to_cpu(args->args[f->size_idx2]);
-			else if (f->fixed_size)
-				size = f->fixed_size;
-			else
-				size = 1;
-			end = base + size - 1;
-
-			/*
-			 * Special case for ibm,configure-connector where the
-			 * address can be 0
-			 */
-			if (!strcmp(f->name, "ibm,configure-connector") &&
-			    base == 0)
-				return false;
-
-			if (!in_rmo_buf(base, end))
-				goto err;
-		}
-
-		return false;
+	const struct rtas_function *func;
+	const struct rtas_filter *f;
+	u32 base, size, end;
+
+	/*
+	 * If this token doesn't correspond to a function the kernel
+	 * understands, you're not allowed to call it.
+	 */
+	func = rtas_token_to_function(token);
+	if (!func)
+		goto err;
+	/*
+	 * And only functions with filters attached are allowed.
+	 */
+	f = func->filter;
+	if (!f)
+		goto err;
+	/*
+	 * And some functions aren't allowed on LE.
+	 */
+	if (IS_ENABLED(CONFIG_CPU_LITTLE_ENDIAN) && func->banned_for_syscall_on_le)
+		goto err;
+
+	if (f->buf_idx1 != -1) {
+		base = be32_to_cpu(args->args[f->buf_idx1]);
+		if (f->size_idx1 != -1)
+			size = be32_to_cpu(args->args[f->size_idx1]);
+		else if (f->fixed_size)
+			size = f->fixed_size;
+		else
+			size = 1;
+
+		end = base + size - 1;
+
+		/*
+		 * Special case for ibm,platform-dump - NULL buffer
+		 * address is used to indicate end of dump processing
+		 */
+		if (!strcmp(func->name, "ibm,platform-dump") &&
+		    base == 0)
+			return false;
+
+		if (!in_rmo_buf(base, end))
+			goto err;
+	}
+
+	if (f->buf_idx2 != -1) {
+		base = be32_to_cpu(args->args[f->buf_idx2]);
+		if (f->size_idx2 != -1)
+			size = be32_to_cpu(args->args[f->size_idx2]);
+		else if (f->fixed_size)
+			size = f->fixed_size;
+		else
+			size = 1;
+		end = base + size - 1;
+
+		/*
+		 * Special case for ibm,configure-connector where the
+		 * address can be 0
+		 */
+		if (!strcmp(func->name, "ibm,configure-connector") &&
+		    base == 0)
+			return false;
+
+		if (!in_rmo_buf(base, end))
+			goto err;
 	}
 
+	return false;
 err:
 	pr_err_ratelimited("sys_rtas: RTAS call blocked - exploit attempt?\n");
 	pr_err_ratelimited("sys_rtas: token=0x%x, nargs=%d (called by %s)\n",
@@ -1216,14 +1683,6 @@ static bool block_rtas_call(int token, int nargs,
 	return true;
 }
 
-static void __init rtas_syscall_filter_init(void)
-{
-	unsigned int i;
-
-	for (i = 0; i < ARRAY_SIZE(rtas_filters); i++)
-		rtas_filters[i].token = rtas_token(rtas_filters[i].name);
-}
-
 /* We assume to be passed big endian arguments */
 SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs)
 {
@@ -1323,6 +1782,54 @@ SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs)
 	return 0;
 }
 
+static void __init rtas_function_table_init(void)
+{
+	struct property *prop;
+
+	for (size_t i = 0; i < ARRAY_SIZE(rtas_function_table); ++i) {
+		struct rtas_function *curr = &rtas_function_table[i];
+		struct rtas_function *prior;
+		int cmp;
+
+		curr->token = RTAS_UNKNOWN_SERVICE;
+
+		if (i == 0)
+			continue;
+		/*
+		 * Ensure table is sorted correctly for binary search
+		 * on function names.
+		 */
+		prior = &rtas_function_table[i - 1];
+
+		cmp = strcmp(prior->name, curr->name);
+		if (cmp < 0)
+			continue;
+
+		if (cmp == 0) {
+			pr_err("'%s' has duplicate function table entries\n",
+			       curr->name);
+		} else {
+			pr_err("function table unsorted: '%s' wrongly precedes '%s'\n",
+			       prior->name, curr->name);
+		}
+	}
+
+	for_each_property_of_node(rtas.dev, prop) {
+		struct rtas_function *func;
+
+		if (prop->length != sizeof(u32))
+			continue;
+
+		func = __rtas_name_to_function(prop->name);
+		if (!func)
+			continue;
+
+		func->token = be32_to_cpup((__be32 *)prop->value);
+
+		pr_debug("function %s has token %u\n", func->name, func->token);
+	}
+}
+
 /*
  * Call early during boot, before mem init, to retrieve the RTAS
  * information from the device-tree and allocate the RMO buffer for userland
@@ -1356,6 +1863,9 @@ void __init rtas_initialize(void)
 
 	init_error_log_max();
 
+	/* Must be called before any function token lookups */
+	rtas_function_table_init();
+
 	/*
 	 * Discover these now to avoid device tree lookups in the
 	 * panic path.
@@ -1381,7 +1891,6 @@ void __init rtas_initialize(void)
 #endif
 	ibm_open_errinjct_token = rtas_token("ibm,open-errinjct");
 	ibm_errinjct_token = rtas_token("ibm,errinjct");
-	rtas_syscall_filter_init();
 }
 
 int __init early_init_dt_scan_rtas(unsigned long node,

-- 
2.39.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 07/19] powerpc/rtas: improve function information lookups
@ 2023-02-06 18:54   ` Nathan Lynch
  0 siblings, 0 replies; 50+ messages in thread
From: Nathan Lynch @ 2023-02-06 18:54 UTC (permalink / raw)
  To: Michael Ellerman, Nicholas Piggin, Christophe Leroy, Kajol Jain,
	Laurent Dufour, Mahesh J Salgaonkar, Andrew Donnellan,
	Nick Child
  Cc: linuxppc-dev, Nathan Lynch

The core RTAS support code and its clients perform two types of lookup
for RTAS firmware function information.

First, mapping a known function name to a token. The typical use case
invokes rtas_token() to retrieve the token value to pass to
rtas_call(). rtas_token() relies on of_get_property(), which performs
a linear search of the /rtas node's property list under a lock with
IRQs disabled.

Second, and less common: given a token value, looking up some
information about the function. The primary example is the sys_rtas
filter path, which linearly scans a small table to match the token to
a rtas_filter struct. Another use case to come is RTAS entry/exit
tracepoints, which will require efficient lookup of function names
from token values. Currently there is no general API for this.

We need something much like the existing rtas_filters table, but more
general and organized to facilitate efficient lookups.

Introduce:

* A new rtas_function type, aggregating function name, token,
  and filter. Other function characteristics could be added in the
  future.

* An array of rtas_function, where each element corresponds to a known
  RTAS function. All information in the table is static save the token
  values, which are derived from the device tree at boot. The array is
  sorted by function name to allow binary search.

* A named constant for each known RTAS function, used to index the
  function array. These also will be used in a client-facing API to be
  added later.

* An xarray that maps valid tokens to rtas_function objects.

Fold the existing rtas_filter table into the new rtas_function array,
with the appropriate adjustments to block_rtas_call(). Remove
now-redundant fields from struct rtas_filter. Preserve the function of
the CONFIG_CPU_BIG_ENDIAN guard in the current filter table by
introducing a per-function flag that is set for the function entries
related to pseries LPAR migration. These have never had working users
via sys_rtas on ppc64le; see commit de0f7349a0dd ("powerpc/rtas:
prevent suspend-related sys_rtas use on LE").

Convert rtas_token() to use a lockless binary search on the function
table. Fall back to the old behavior for lookups against names that
are not known to be RTAS functions, but issue a warning. rtas_token()
is for function names; it is not a general facility for accessing
arbitrary properties of the /rtas node. All known misuses of
rtas_token() have been converted to more appropriate of_ APIs in
preceding changes.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
---
 arch/powerpc/include/asm/rtas.h |  87 +++++
 arch/powerpc/kernel/rtas.c      | 735 ++++++++++++++++++++++++++++++++++------
 2 files changed, 709 insertions(+), 113 deletions(-)

diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
index 479a95cb2770..14fe79217c26 100644
--- a/arch/powerpc/include/asm/rtas.h
+++ b/arch/powerpc/include/asm/rtas.h
@@ -16,6 +16,93 @@
  * Copyright (C) 2001 PPC 64 Team, IBM Corp
  */
 
+#define rtas_fnidx(x_) RTAS_FNIDX__ ## x_
+
+enum rtas_function_index {
+	rtas_fnidx(CHECK_EXCEPTION),
+	rtas_fnidx(DISPLAY_CHARACTER),
+	rtas_fnidx(EVENT_SCAN),
+	rtas_fnidx(FREEZE_TIME_BASE),
+	rtas_fnidx(GET_POWER_LEVEL),
+	rtas_fnidx(GET_SENSOR_STATE),
+	rtas_fnidx(GET_TERM_CHAR),
+	rtas_fnidx(GET_TIME_OF_DAY),
+	rtas_fnidx(IBM_ACTIVATE_FIRMWARE),
+	rtas_fnidx(IBM_CBE_START_PTCAL),
+	rtas_fnidx(IBM_CBE_STOP_PTCAL),
+	rtas_fnidx(IBM_CHANGE_MSI),
+	rtas_fnidx(IBM_CLOSE_ERRINJCT),
+	rtas_fnidx(IBM_CONFIGURE_BRIDGE),
+	rtas_fnidx(IBM_CONFIGURE_CONNECTOR),
+	rtas_fnidx(IBM_CONFIGURE_KERNEL_DUMP),
+	rtas_fnidx(IBM_CONFIGURE_PE),
+	rtas_fnidx(IBM_CREATE_PE_DMA_WINDOW),
+	rtas_fnidx(IBM_DISPLAY_MESSAGE),
+	rtas_fnidx(IBM_ERRINJCT),
+	rtas_fnidx(IBM_EXTI2C),
+	rtas_fnidx(IBM_GET_CONFIG_ADDR_INFO),
+	rtas_fnidx(IBM_GET_CONFIG_ADDR_INFO2),
+	rtas_fnidx(IBM_GET_DYNAMIC_SENSOR_STATE),
+	rtas_fnidx(IBM_GET_INDICES),
+	rtas_fnidx(IBM_GET_RIO_TOPOLOGY),
+	rtas_fnidx(IBM_GET_SYSTEM_PARAMETER),
+	rtas_fnidx(IBM_GET_VPD),
+	rtas_fnidx(IBM_GET_XIVE),
+	rtas_fnidx(IBM_INT_OFF),
+	rtas_fnidx(IBM_INT_ON),
+	rtas_fnidx(IBM_IO_QUIESCE_ACK),
+	rtas_fnidx(IBM_LPAR_PERFTOOLS),
+	rtas_fnidx(IBM_MANAGE_FLASH_IMAGE),
+	rtas_fnidx(IBM_MANAGE_STORAGE_PRESERVATION),
+	rtas_fnidx(IBM_NMI_INTERLOCK),
+	rtas_fnidx(IBM_NMI_REGISTER),
+	rtas_fnidx(IBM_OPEN_ERRINJCT),
+	rtas_fnidx(IBM_OPEN_SRIOV_ALLOW_UNFREEZE),
+	rtas_fnidx(IBM_OPEN_SRIOV_MAP_PE_NUMBER),
+	rtas_fnidx(IBM_OS_TERM),
+	rtas_fnidx(IBM_PARTNER_CONTROL),
+	rtas_fnidx(IBM_PHYSICAL_ATTESTATION),
+	rtas_fnidx(IBM_PLATFORM_DUMP),
+	rtas_fnidx(IBM_POWER_OFF_UPS),
+	rtas_fnidx(IBM_QUERY_INTERRUPT_SOURCE_NUMBER),
+	rtas_fnidx(IBM_QUERY_PE_DMA_WINDOW),
+	rtas_fnidx(IBM_READ_PCI_CONFIG),
+	rtas_fnidx(IBM_READ_SLOT_RESET_STATE),
+	rtas_fnidx(IBM_READ_SLOT_RESET_STATE2),
+	rtas_fnidx(IBM_REMOVE_PE_DMA_WINDOW),
+	rtas_fnidx(IBM_RESET_PE_DMA_WINDOWS),
+	rtas_fnidx(IBM_SCAN_LOG_DUMP),
+	rtas_fnidx(IBM_SET_DYNAMIC_INDICATOR),
+	rtas_fnidx(IBM_SET_EEH_OPTION),
+	rtas_fnidx(IBM_SET_SLOT_RESET),
+	rtas_fnidx(IBM_SET_SYSTEM_PARAMETER),
+	rtas_fnidx(IBM_SET_XIVE),
+	rtas_fnidx(IBM_SLOT_ERROR_DETAIL),
+	rtas_fnidx(IBM_SUSPEND_ME),
+	rtas_fnidx(IBM_TUNE_DMA_PARMS),
+	rtas_fnidx(IBM_UPDATE_FLASH_64_AND_REBOOT),
+	rtas_fnidx(IBM_UPDATE_NODES),
+	rtas_fnidx(IBM_UPDATE_PROPERTIES),
+	rtas_fnidx(IBM_VALIDATE_FLASH_IMAGE),
+	rtas_fnidx(IBM_WRITE_PCI_CONFIG),
+	rtas_fnidx(NVRAM_FETCH),
+	rtas_fnidx(NVRAM_STORE),
+	rtas_fnidx(POWER_OFF),
+	rtas_fnidx(PUT_TERM_CHAR),
+	rtas_fnidx(QUERY_CPU_STOPPED_STATE),
+	rtas_fnidx(READ_PCI_CONFIG),
+	rtas_fnidx(RTAS_LAST_ERROR),
+	rtas_fnidx(SET_INDICATOR),
+	rtas_fnidx(SET_POWER_LEVEL),
+	rtas_fnidx(SET_TIME_FOR_POWER_ON),
+	rtas_fnidx(SET_TIME_OF_DAY),
+	rtas_fnidx(START_CPU),
+	rtas_fnidx(STOP_SELF),
+	rtas_fnidx(SYSTEM_REBOOT),
+	rtas_fnidx(THAW_TIME_BASE),
+	rtas_fnidx(WRITE_PCI_CONFIG),
+};
+
 #define RTAS_UNKNOWN_SERVICE (-1)
 #define RTAS_INSTANTIATE_MAX (1ULL<<30) /* Don't instantiate rtas at/above this value */
 
diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index ec2df09a70cf..2804382c74b1 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -9,10 +9,12 @@
 
 #define pr_fmt(fmt)	"rtas: " fmt
 
+#include <linux/bsearch.h>
 #include <linux/capability.h>
 #include <linux/delay.h>
 #include <linux/export.h>
 #include <linux/init.h>
+#include <linux/kconfig.h>
 #include <linux/kernel.h>
 #include <linux/memblock.h>
 #include <linux/of.h>
@@ -26,6 +28,7 @@
 #include <linux/syscalls.h>
 #include <linux/types.h>
 #include <linux/uaccess.h>
+#include <linux/xarray.h>
 
 #include <asm/delay.h>
 #include <asm/firmware.h>
@@ -37,6 +40,486 @@
 #include <asm/time.h>
 #include <asm/udbg.h>
 
+struct rtas_filter {
+	/* Indexes into the args buffer, -1 if not used */
+	const int buf_idx1;
+	const int size_idx1;
+	const int buf_idx2;
+	const int size_idx2;
+	/*
+	 * Assumed buffer size per the spec if the function does not
+	 * have a size parameter, e.g. ibm,errinjct. 0 if unused.
+	 */
+	const int fixed_size;
+};
+
+/**
+ * struct rtas_function - Descriptor for RTAS functions.
+ *
+ * @token: Value of @name if it exists under the /rtas node.
+ * @name: Function name.
+ * @filter: If non-NULL, invoking this function via the rtas syscall is
+ *          generally allowed, and @filter describes constraints on the
+ *          arguments. See also @banned_for_syscall_on_le.
+ * @banned_for_syscall_on_le: Set when call via sys_rtas is generally allowed
+ *                            but specifically restricted on ppc64le. Such
+ *                            functions are believed to have no users on
+ *                            ppc64le, and we want to keep it that way. It does
+ *                            not make sense for this to be set when @filter
+ *                            is false.
+ */
+struct rtas_function {
+	s32 token;
+	const bool banned_for_syscall_on_le:1;
+	const char * const name;
+	const struct rtas_filter *filter;
+};
+
+static struct rtas_function rtas_function_table[] __ro_after_init = {
+	[rtas_fnidx(CHECK_EXCEPTION)] = {
+		.name = "check-exception",
+	},
+	[rtas_fnidx(DISPLAY_CHARACTER)] = {
+		.name = "display-character",
+		.filter = &(const struct rtas_filter) {
+			.buf_idx1 = -1, .size_idx1 = -1,
+			.buf_idx2 = -1, .size_idx2 = -1,
+		},
+	},
+	[rtas_fnidx(EVENT_SCAN)] = {
+		.name = "event-scan",
+	},
+	[rtas_fnidx(FREEZE_TIME_BASE)] = {
+		.name = "freeze-time-base",
+	},
+	[rtas_fnidx(GET_POWER_LEVEL)] = {
+		.name = "get-power-level",
+		.filter = &(const struct rtas_filter) {
+			.buf_idx1 = -1, .size_idx1 = -1,
+			.buf_idx2 = -1, .size_idx2 = -1,
+		},
+	},
+	[rtas_fnidx(GET_SENSOR_STATE)] = {
+		.name = "get-sensor-state",
+		.filter = &(const struct rtas_filter) {
+			.buf_idx1 = -1, .size_idx1 = -1,
+			.buf_idx2 = -1, .size_idx2 = -1,
+		},
+	},
+	[rtas_fnidx(GET_TERM_CHAR)] = {
+		.name = "get-term-char",
+	},
+	[rtas_fnidx(GET_TIME_OF_DAY)] = {
+		.name = "get-time-of-day",
+		.filter = &(const struct rtas_filter) {
+			.buf_idx1 = -1, .size_idx1 = -1,
+			.buf_idx2 = -1, .size_idx2 = -1,
+		},
+	},
+	[rtas_fnidx(IBM_ACTIVATE_FIRMWARE)] = {
+		.name = "ibm,activate-firmware",
+		.filter = &(const struct rtas_filter) {
+			.buf_idx1 = -1, .size_idx1 = -1,
+			.buf_idx2 = -1, .size_idx2 = -1,
+		},
+	},
+	[rtas_fnidx(IBM_CBE_START_PTCAL)] = {
+		.name = "ibm,cbe-start-ptcal",
+	},
+	[rtas_fnidx(IBM_CBE_STOP_PTCAL)] = {
+		.name = "ibm,cbe-stop-ptcal",
+	},
+	[rtas_fnidx(IBM_CHANGE_MSI)] = {
+		.name = "ibm,change-msi",
+	},
+	[rtas_fnidx(IBM_CLOSE_ERRINJCT)] = {
+		.name = "ibm,close-errinjct",
+		.filter = &(const struct rtas_filter) {
+			.buf_idx1 = -1, .size_idx1 = -1,
+			.buf_idx2 = -1, .size_idx2 = -1,
+		},
+	},
+	[rtas_fnidx(IBM_CONFIGURE_BRIDGE)] = {
+		.name = "ibm,configure-bridge",
+	},
+	[rtas_fnidx(IBM_CONFIGURE_CONNECTOR)] = {
+		.name = "ibm,configure-connector",
+		.filter = &(const struct rtas_filter) {
+			.buf_idx1 = 0, .size_idx1 = -1,
+			.buf_idx2 = 1, .size_idx2 = -1,
+			.fixed_size = 4096,
+		},
+	},
+	[rtas_fnidx(IBM_CONFIGURE_KERNEL_DUMP)] = {
+		.name = "ibm,configure-kernel-dump",
+	},
+	[rtas_fnidx(IBM_CONFIGURE_PE)] = {
+		.name = "ibm,configure-pe",
+	},
+	[rtas_fnidx(IBM_CREATE_PE_DMA_WINDOW)] = {
+		.name = "ibm,create-pe-dma-window",
+	},
+	[rtas_fnidx(IBM_DISPLAY_MESSAGE)] = {
+		.name = "ibm,display-message",
+		.filter = &(const struct rtas_filter) {
+			.buf_idx1 = 0, .size_idx1 = -1,
+			.buf_idx2 = -1, .size_idx2 = -1,
+		},
+	},
+	[rtas_fnidx(IBM_ERRINJCT)] = {
+		.name = "ibm,errinjct",
+		.filter = &(const struct rtas_filter) {
+			.buf_idx1 = 2, .size_idx1 = -1,
+			.buf_idx2 = -1, .size_idx2 = -1,
+			.fixed_size = 1024,
+		},
+	},
+	[rtas_fnidx(IBM_EXTI2C)] = {
+		.name = "ibm,exti2c",
+	},
+	[rtas_fnidx(IBM_GET_CONFIG_ADDR_INFO)] = {
+		.name = "ibm,get-config-addr-info",
+	},
+	[rtas_fnidx(IBM_GET_CONFIG_ADDR_INFO2)] = {
+		.name = "ibm,get-config-addr-info2",
+		.filter = &(const struct rtas_filter) {
+			.buf_idx1 = -1, .size_idx1 = -1,
+			.buf_idx2 = -1, .size_idx2 = -1,
+		},
+	},
+	[rtas_fnidx(IBM_GET_DYNAMIC_SENSOR_STATE)] = {
+		.name = "ibm,get-dynamic-sensor-state",
+		.filter = &(const struct rtas_filter) {
+			.buf_idx1 = 1, .size_idx1 = -1,
+			.buf_idx2 = -1, .size_idx2 = -1,
+		},
+	},
+	[rtas_fnidx(IBM_GET_INDICES)] = {
+		.name = "ibm,get-indices",
+		.filter = &(const struct rtas_filter) {
+			.buf_idx1 = 2, .size_idx1 = 3,
+			.buf_idx2 = -1, .size_idx2 = -1,
+		},
+	},
+	[rtas_fnidx(IBM_GET_RIO_TOPOLOGY)] = {
+		.name = "ibm,get-rio-topology",
+	},
+	[rtas_fnidx(IBM_GET_SYSTEM_PARAMETER)] = {
+		.name = "ibm,get-system-parameter",
+		.filter = &(const struct rtas_filter) {
+			.buf_idx1 = 1, .size_idx1 = 2,
+			.buf_idx2 = -1, .size_idx2 = -1,
+		},
+	},
+	[rtas_fnidx(IBM_GET_VPD)] = {
+		.name = "ibm,get-vpd",
+		.filter = &(const struct rtas_filter) {
+			.buf_idx1 = 0, .size_idx1 = -1,
+			.buf_idx2 = 1, .size_idx2 = 2,
+		},
+	},
+	[rtas_fnidx(IBM_GET_XIVE)] = {
+		.name = "ibm,get-xive",
+	},
+	[rtas_fnidx(IBM_INT_OFF)] = {
+		.name = "ibm,int-off",
+	},
+	[rtas_fnidx(IBM_INT_ON)] = {
+		.name = "ibm,int-on",
+	},
+	[rtas_fnidx(IBM_IO_QUIESCE_ACK)] = {
+		.name = "ibm,io-quiesce-ack",
+	},
+	[rtas_fnidx(IBM_LPAR_PERFTOOLS)] = {
+		.name = "ibm,lpar-perftools",
+		.filter = &(const struct rtas_filter) {
+			.buf_idx1 = 2, .size_idx1 = 3,
+			.buf_idx2 = -1, .size_idx2 = -1,
+		},
+	},
+	[rtas_fnidx(IBM_MANAGE_FLASH_IMAGE)] = {
+		.name = "ibm,manage-flash-image",
+	},
+	[rtas_fnidx(IBM_MANAGE_STORAGE_PRESERVATION)] = {
+		.name = "ibm,manage-storage-preservation",
+	},
+	[rtas_fnidx(IBM_NMI_INTERLOCK)] = {
+		.name = "ibm,nmi-interlock",
+	},
+	[rtas_fnidx(IBM_NMI_REGISTER)] = {
+		.name = "ibm,nmi-register",
+	},
+	[rtas_fnidx(IBM_OPEN_ERRINJCT)] = {
+		.name = "ibm,open-errinjct",
+		.filter = &(const struct rtas_filter) {
+			.buf_idx1 = -1, .size_idx1 = -1,
+			.buf_idx2 = -1, .size_idx2 = -1,
+		},
+	},
+	[rtas_fnidx(IBM_OPEN_SRIOV_ALLOW_UNFREEZE)] = {
+		.name = "ibm,open-sriov-allow-unfreeze",
+	},
+	[rtas_fnidx(IBM_OPEN_SRIOV_MAP_PE_NUMBER)] = {
+		.name = "ibm,open-sriov-map-pe-number",
+	},
+	[rtas_fnidx(IBM_OS_TERM)] = {
+		.name = "ibm,os-term",
+	},
+	[rtas_fnidx(IBM_PARTNER_CONTROL)] = {
+		.name = "ibm,partner-control",
+	},
+	[rtas_fnidx(IBM_PHYSICAL_ATTESTATION)] = {
+		.name = "ibm,physical-attestation",
+		.filter = &(const struct rtas_filter) {
+			.buf_idx1 = 0, .size_idx1 = 1,
+			.buf_idx2 = -1, .size_idx2 = -1,
+		},
+	},
+	[rtas_fnidx(IBM_PLATFORM_DUMP)] = {
+		.name = "ibm,platform-dump",
+		.filter = &(const struct rtas_filter) {
+			.buf_idx1 = 4, .size_idx1 = 5,
+			.buf_idx2 = -1, .size_idx2 = -1,
+		},
+	},
+	[rtas_fnidx(IBM_POWER_OFF_UPS)] = {
+		.name = "ibm,power-off-ups",
+	},
+	[rtas_fnidx(IBM_QUERY_INTERRUPT_SOURCE_NUMBER)] = {
+		.name = "ibm,query-interrupt-source-number",
+	},
+	[rtas_fnidx(IBM_QUERY_PE_DMA_WINDOW)] = {
+		.name = "ibm,query-pe-dma-window",
+	},
+	[rtas_fnidx(IBM_READ_PCI_CONFIG)] = {
+		.name = "ibm,read-pci-config",
+	},
+	[rtas_fnidx(IBM_READ_SLOT_RESET_STATE)] = {
+		.name = "ibm,read-slot-reset-state",
+		.filter = &(const struct rtas_filter) {
+			.buf_idx1 = -1, .size_idx1 = -1,
+			.buf_idx2 = -1, .size_idx2 = -1,
+		},
+	},
+	[rtas_fnidx(IBM_READ_SLOT_RESET_STATE2)] = {
+		.name = "ibm,read-slot-reset-state2",
+	},
+	[rtas_fnidx(IBM_REMOVE_PE_DMA_WINDOW)] = {
+		.name = "ibm,remove-pe-dma-window",
+	},
+	[rtas_fnidx(IBM_RESET_PE_DMA_WINDOWS)] = {
+		.name = "ibm,reset-pe-dma-windows",
+	},
+	[rtas_fnidx(IBM_SCAN_LOG_DUMP)] = {
+		.name = "ibm,scan-log-dump",
+		.filter = &(const struct rtas_filter) {
+			.buf_idx1 = 0, .size_idx1 = 1,
+			.buf_idx2 = -1, .size_idx2 = -1,
+		},
+	},
+	[rtas_fnidx(IBM_SET_DYNAMIC_INDICATOR)] = {
+		.name = "ibm,set-dynamic-indicator",
+		.filter = &(const struct rtas_filter) {
+			.buf_idx1 = 2, .size_idx1 = -1,
+			.buf_idx2 = -1, .size_idx2 = -1,
+		},
+	},
+	[rtas_fnidx(IBM_SET_EEH_OPTION)] = {
+		.name = "ibm,set-eeh-option",
+		.filter = &(const struct rtas_filter) {
+			.buf_idx1 = -1, .size_idx1 = -1,
+			.buf_idx2 = -1, .size_idx2 = -1,
+		},
+	},
+	[rtas_fnidx(IBM_SET_SLOT_RESET)] = {
+		.name = "ibm,set-slot-reset",
+	},
+	[rtas_fnidx(IBM_SET_SYSTEM_PARAMETER)] = {
+		.name = "ibm,set-system-parameter",
+		.filter = &(const struct rtas_filter) {
+			.buf_idx1 = 1, .size_idx1 = -1,
+			.buf_idx2 = -1, .size_idx2 = -1,
+		},
+	},
+	[rtas_fnidx(IBM_SET_XIVE)] = {
+		.name = "ibm,set-xive",
+	},
+	[rtas_fnidx(IBM_SLOT_ERROR_DETAIL)] = {
+		.name = "ibm,slot-error-detail",
+	},
+	[rtas_fnidx(IBM_SUSPEND_ME)] = {
+		.name = "ibm,suspend-me",
+		.banned_for_syscall_on_le = true,
+		.filter = &(const struct rtas_filter) {
+			.buf_idx1 = -1, .size_idx1 = -1,
+			.buf_idx2 = -1, .size_idx2 = -1,
+		},
+	},
+	[rtas_fnidx(IBM_TUNE_DMA_PARMS)] = {
+		.name = "ibm,tune-dma-parms",
+	},
+	[rtas_fnidx(IBM_UPDATE_FLASH_64_AND_REBOOT)] = {
+		.name = "ibm,update-flash-64-and-reboot",
+	},
+	[rtas_fnidx(IBM_UPDATE_NODES)] = {
+		.name = "ibm,update-nodes",
+		.banned_for_syscall_on_le = true,
+		.filter = &(const struct rtas_filter) {
+			.buf_idx1 = 0, .size_idx1 = -1,
+			.buf_idx2 = -1, .size_idx2 = -1,
+			.fixed_size = 4096,
+		},
+	},
+	[rtas_fnidx(IBM_UPDATE_PROPERTIES)] = {
+		.name = "ibm,update-properties",
+		.banned_for_syscall_on_le = true,
+		.filter = &(const struct rtas_filter) {
+			.buf_idx1 = 0, .size_idx1 = -1,
+			.buf_idx2 = -1, .size_idx2 = -1,
+			.fixed_size = 4096,
+		},
+	},
+	[rtas_fnidx(IBM_VALIDATE_FLASH_IMAGE)] = {
+		.name = "ibm,validate-flash-image",
+	},
+	[rtas_fnidx(IBM_WRITE_PCI_CONFIG)] = {
+		.name = "ibm,write-pci-config",
+	},
+	[rtas_fnidx(NVRAM_FETCH)] = {
+		.name = "nvram-fetch",
+	},
+	[rtas_fnidx(NVRAM_STORE)] = {
+		.name = "nvram-store",
+	},
+	[rtas_fnidx(POWER_OFF)] = {
+		.name = "power-off",
+	},
+	[rtas_fnidx(PUT_TERM_CHAR)] = {
+		.name = "put-term-char",
+	},
+	[rtas_fnidx(QUERY_CPU_STOPPED_STATE)] = {
+		.name = "query-cpu-stopped-state",
+	},
+	[rtas_fnidx(READ_PCI_CONFIG)] = {
+		.name = "read-pci-config",
+	},
+	[rtas_fnidx(RTAS_LAST_ERROR)] = {
+		.name = "rtas-last-error",
+	},
+	[rtas_fnidx(SET_INDICATOR)] = {
+		.name = "set-indicator",
+		.filter = &(const struct rtas_filter) {
+			.buf_idx1 = -1, .size_idx1 = -1,
+			.buf_idx2 = -1, .size_idx2 = -1,
+		},
+	},
+	[rtas_fnidx(SET_POWER_LEVEL)] = {
+		.name = "set-power-level",
+		.filter = &(const struct rtas_filter) {
+			.buf_idx1 = -1, .size_idx1 = -1,
+			.buf_idx2 = -1, .size_idx2 = -1,
+		},
+	},
+	[rtas_fnidx(SET_TIME_FOR_POWER_ON)] = {
+		.name = "set-time-for-power-on",
+		.filter = &(const struct rtas_filter) {
+			.buf_idx1 = -1, .size_idx1 = -1,
+			.buf_idx2 = -1, .size_idx2 = -1,
+		},
+	},
+	[rtas_fnidx(SET_TIME_OF_DAY)] = {
+		.name = "set-time-of-day",
+		.filter = &(const struct rtas_filter) {
+			.buf_idx1 = -1, .size_idx1 = -1,
+			.buf_idx2 = -1, .size_idx2 = -1,
+		},
+	},
+	[rtas_fnidx(START_CPU)] = {
+		.name = "start-cpu",
+	},
+	[rtas_fnidx(STOP_SELF)] = {
+		.name = "stop-self",
+	},
+	[rtas_fnidx(SYSTEM_REBOOT)] = {
+		.name = "system-reboot",
+	},
+	[rtas_fnidx(THAW_TIME_BASE)] = {
+		.name = "thaw-time-base",
+	},
+	[rtas_fnidx(WRITE_PCI_CONFIG)] = {
+		.name = "write-pci-config",
+	},
+};
+
+static int rtas_function_cmp(const void *a, const void *b)
+{
+	const struct rtas_function *f1 = a;
+	const struct rtas_function *f2 = b;
+
+	return strcmp(f1->name, f2->name);
+}
+
+/*
+ * Boot-time initialization of the function table needs the lookup to
+ * return a non-const-qualified object. Use rtas_name_to_function()
+ * in all other contexts.
+ */
+static struct rtas_function *__rtas_name_to_function(const char *name)
+{
+	const struct rtas_function key = {
+		.name = name,
+	};
+	struct rtas_function *found;
+
+	found = bsearch(&key, rtas_function_table, ARRAY_SIZE(rtas_function_table),
+			sizeof(rtas_function_table[0]), rtas_function_cmp);
+
+	return found;
+}
+
+static const struct rtas_function *rtas_name_to_function(const char *name)
+{
+	return __rtas_name_to_function(name);
+}
+
+static DEFINE_XARRAY(rtas_token_to_function_xarray);
+
+static int __init rtas_token_to_function_xarray_init(void)
+{
+	int err = 0;
+
+	for (size_t i = 0; i < ARRAY_SIZE(rtas_function_table); ++i) {
+		const struct rtas_function *func = &rtas_function_table[i];
+		const s32 token = func->token;
+
+		if (token == RTAS_UNKNOWN_SERVICE)
+			continue;
+
+		err = xa_err(xa_store(&rtas_token_to_function_xarray,
+				      token, (void *)func, GFP_KERNEL));
+		if (err)
+			break;
+	}
+
+	return err;
+}
+arch_initcall(rtas_token_to_function_xarray_init);
+
+static const struct rtas_function *rtas_token_to_function(s32 token)
+{
+	const struct rtas_function *func;
+
+	if (WARN_ONCE(token < 0, "invalid token %d", token))
+		return NULL;
+
+	func = xa_load(&rtas_token_to_function_xarray, token);
+
+	if (WARN_ONCE(!func, "unexpected failed lookup for token %d", token))
+		return NULL;
+
+	return func;
+}
+
 /* This is here deliberately so it's only used in this file */
 void enter_rtas(unsigned long);
 
@@ -315,9 +798,25 @@ EXPORT_SYMBOL_GPL(rtas_progress);		/* needed by rtas_flash module */
 
 int rtas_token(const char *service)
 {
+	const struct rtas_function *func;
 	const __be32 *tokp;
+
 	if (rtas.dev == NULL)
 		return RTAS_UNKNOWN_SERVICE;
+
+	func = rtas_name_to_function(service);
+	if (func)
+		return func->token;
+	/*
+	 * The caller is looking up a name that is not known to be an
+	 * RTAS function. Either it's a function that needs to be
+	 * added to the table, or they're misusing rtas_token() to
+	 * access non-function properties of the /rtas node. Warn and
+	 * fall back to the legacy behavior.
+	 */
+	WARN_ONCE(1, "unknown function `%s`, should it be added to rtas_function_table?\n",
+		  service);
+
 	tokp = of_get_property(rtas.dev, service, NULL);
 	return tokp ? be32_to_cpu(*tokp) : RTAS_UNKNOWN_SERVICE;
 }
@@ -1089,56 +1588,12 @@ noinstr struct pseries_errorlog *get_pseries_errorlog(struct rtas_error_log *log
  *
  * Accordingly, we filter RTAS requests to check that the call is
  * permitted, and that provided pointers fall within the RMO buffer.
- * The rtas_filters list contains an entry for each permitted call,
- * with the indexes of the parameters which are expected to contain
- * addresses and sizes of buffers allocated inside the RMO buffer.
+ * If a function is allowed to be invoked via the syscall, then its
+ * entry in the rtas_functions table points to a rtas_filter that
+ * describes its constraints, with the indexes of the parameters which
+ * are expected to contain addresses and sizes of buffers allocated
+ * inside the RMO buffer.
  */
-struct rtas_filter {
-	const char *name;
-	int token;
-	/* Indexes into the args buffer, -1 if not used */
-	int buf_idx1;
-	int size_idx1;
-	int buf_idx2;
-	int size_idx2;
-
-	int fixed_size;
-};
-
-static struct rtas_filter rtas_filters[] __ro_after_init = {
-	{ "ibm,activate-firmware", -1, -1, -1, -1, -1 },
-	{ "ibm,configure-connector", -1, 0, -1, 1, -1, 4096 },	/* Special cased */
-	{ "display-character", -1, -1, -1, -1, -1 },
-	{ "ibm,display-message", -1, 0, -1, -1, -1 },
-	{ "ibm,errinjct", -1, 2, -1, -1, -1, 1024 },
-	{ "ibm,close-errinjct", -1, -1, -1, -1, -1 },
-	{ "ibm,open-errinjct", -1, -1, -1, -1, -1 },
-	{ "ibm,get-config-addr-info2", -1, -1, -1, -1, -1 },
-	{ "ibm,get-dynamic-sensor-state", -1, 1, -1, -1, -1 },
-	{ "ibm,get-indices", -1, 2, 3, -1, -1 },
-	{ "get-power-level", -1, -1, -1, -1, -1 },
-	{ "get-sensor-state", -1, -1, -1, -1, -1 },
-	{ "ibm,get-system-parameter", -1, 1, 2, -1, -1 },
-	{ "get-time-of-day", -1, -1, -1, -1, -1 },
-	{ "ibm,get-vpd", -1, 0, -1, 1, 2 },
-	{ "ibm,lpar-perftools", -1, 2, 3, -1, -1 },
-	{ "ibm,platform-dump", -1, 4, 5, -1, -1 },		/* Special cased */
-	{ "ibm,read-slot-reset-state", -1, -1, -1, -1, -1 },
-	{ "ibm,scan-log-dump", -1, 0, 1, -1, -1 },
-	{ "ibm,set-dynamic-indicator", -1, 2, -1, -1, -1 },
-	{ "ibm,set-eeh-option", -1, -1, -1, -1, -1 },
-	{ "set-indicator", -1, -1, -1, -1, -1 },
-	{ "set-power-level", -1, -1, -1, -1, -1 },
-	{ "set-time-for-power-on", -1, -1, -1, -1, -1 },
-	{ "ibm,set-system-parameter", -1, 1, -1, -1, -1 },
-	{ "set-time-of-day", -1, -1, -1, -1, -1 },
-#ifdef CONFIG_CPU_BIG_ENDIAN
-	{ "ibm,suspend-me", -1, -1, -1, -1, -1 },
-	{ "ibm,update-nodes", -1, 0, -1, -1, -1, 4096 },
-	{ "ibm,update-properties", -1, 0, -1, -1, -1, 4096 },
-#endif
-	{ "ibm,physical-attestation", -1, 0, 1, -1, -1 },
-};
 
 static bool in_rmo_buf(u32 base, u32 end)
 {
@@ -1152,63 +1607,75 @@ static bool in_rmo_buf(u32 base, u32 end)
 static bool block_rtas_call(int token, int nargs,
 			    struct rtas_args *args)
 {
-	int i;
-
-	for (i = 0; i < ARRAY_SIZE(rtas_filters); i++) {
-		struct rtas_filter *f = &rtas_filters[i];
-		u32 base, size, end;
-
-		if (token != f->token)
-			continue;
-
-		if (f->buf_idx1 != -1) {
-			base = be32_to_cpu(args->args[f->buf_idx1]);
-			if (f->size_idx1 != -1)
-				size = be32_to_cpu(args->args[f->size_idx1]);
-			else if (f->fixed_size)
-				size = f->fixed_size;
-			else
-				size = 1;
-
-			end = base + size - 1;
-
-			/*
-			 * Special case for ibm,platform-dump - NULL buffer
-			 * address is used to indicate end of dump processing
-			 */
-			if (!strcmp(f->name, "ibm,platform-dump") &&
-			    base == 0)
-				return false;
-
-			if (!in_rmo_buf(base, end))
-				goto err;
-		}
-
-		if (f->buf_idx2 != -1) {
-			base = be32_to_cpu(args->args[f->buf_idx2]);
-			if (f->size_idx2 != -1)
-				size = be32_to_cpu(args->args[f->size_idx2]);
-			else if (f->fixed_size)
-				size = f->fixed_size;
-			else
-				size = 1;
-			end = base + size - 1;
-
-			/*
-			 * Special case for ibm,configure-connector where the
-			 * address can be 0
-			 */
-			if (!strcmp(f->name, "ibm,configure-connector") &&
-			    base == 0)
-				return false;
-
-			if (!in_rmo_buf(base, end))
-				goto err;
-		}
-
-		return false;
+	const struct rtas_function *func;
+	const struct rtas_filter *f;
+	u32 base, size, end;
+
+	/*
+	 * If this token doesn't correspond to a function the kernel
+	 * understands, you're not allowed to call it.
+	 */
+	func = rtas_token_to_function(token);
+	if (!func)
+		goto err;
+	/*
+	 * And only functions with filters attached are allowed.
+	 */
+	f = func->filter;
+	if (!f)
+		goto err;
+	/*
+	 * And some functions aren't allowed on LE.
+	 */
+	if (IS_ENABLED(CONFIG_CPU_LITTLE_ENDIAN) && func->banned_for_syscall_on_le)
+		goto err;
+
+	if (f->buf_idx1 != -1) {
+		base = be32_to_cpu(args->args[f->buf_idx1]);
+		if (f->size_idx1 != -1)
+			size = be32_to_cpu(args->args[f->size_idx1]);
+		else if (f->fixed_size)
+			size = f->fixed_size;
+		else
+			size = 1;
+
+		end = base + size - 1;
+
+		/*
+		 * Special case for ibm,platform-dump - NULL buffer
+		 * address is used to indicate end of dump processing
+		 */
+		if (!strcmp(func->name, "ibm,platform-dump") &&
+		    base == 0)
+			return false;
+
+		if (!in_rmo_buf(base, end))
+			goto err;
+	}
+
+	if (f->buf_idx2 != -1) {
+		base = be32_to_cpu(args->args[f->buf_idx2]);
+		if (f->size_idx2 != -1)
+			size = be32_to_cpu(args->args[f->size_idx2]);
+		else if (f->fixed_size)
+			size = f->fixed_size;
+		else
+			size = 1;
+		end = base + size - 1;
+
+		/*
+		 * Special case for ibm,configure-connector where the
+		 * address can be 0
+		 */
+		if (!strcmp(func->name, "ibm,configure-connector") &&
+		    base == 0)
+			return false;
+
+		if (!in_rmo_buf(base, end))
+			goto err;
 	}
 
+	return false;
 err:
 	pr_err_ratelimited("sys_rtas: RTAS call blocked - exploit attempt?\n");
 	pr_err_ratelimited("sys_rtas: token=0x%x, nargs=%d (called by %s)\n",
@@ -1216,14 +1683,6 @@ static bool block_rtas_call(int token, int nargs,
 	return true;
 }
 
-static void __init rtas_syscall_filter_init(void)
-{
-	unsigned int i;
-
-	for (i = 0; i < ARRAY_SIZE(rtas_filters); i++)
-		rtas_filters[i].token = rtas_token(rtas_filters[i].name);
-}
-
 /* We assume to be passed big endian arguments */
 SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs)
 {
@@ -1323,6 +1782,54 @@ SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs)
 	return 0;
 }
 
+static void __init rtas_function_table_init(void)
+{
+	struct property *prop;
+
+	for (size_t i = 0; i < ARRAY_SIZE(rtas_function_table); ++i) {
+		struct rtas_function *curr = &rtas_function_table[i];
+		struct rtas_function *prior;
+		int cmp;
+
+		curr->token = RTAS_UNKNOWN_SERVICE;
+
+		if (i == 0)
+			continue;
+		/*
+		 * Ensure table is sorted correctly for binary search
+		 * on function names.
+		 */
+		prior = &rtas_function_table[i - 1];
+
+		cmp = strcmp(prior->name, curr->name);
+		if (cmp < 0)
+			continue;
+
+		if (cmp == 0) {
+			pr_err("'%s' has duplicate function table entries\n",
+			       curr->name);
+		} else {
+			pr_err("function table unsorted: '%s' wrongly precedes '%s'\n",
+			       prior->name, curr->name);
+		}
+	}
+
+	for_each_property_of_node(rtas.dev, prop) {
+		struct rtas_function *func;
+
+		if (prop->length != sizeof(u32))
+			continue;
+
+		func = __rtas_name_to_function(prop->name);
+		if (!func)
+			continue;
+
+		func->token = be32_to_cpup((__be32 *)prop->value);
+
+		pr_debug("function %s has token %u\n", func->name, func->token);
+	}
+}
+
 /*
  * Call early during boot, before mem init, to retrieve the RTAS
  * information from the device-tree and allocate the RMO buffer for userland
@@ -1356,6 +1863,9 @@ void __init rtas_initialize(void)
 
 	init_error_log_max();
 
+	/* Must be called before any function token lookups */
+	rtas_function_table_init();
+
 	/*
 	 * Discover these now to avoid device tree lookups in the
 	 * panic path.
@@ -1381,7 +1891,6 @@ void __init rtas_initialize(void)
 #endif
 	ibm_open_errinjct_token = rtas_token("ibm,open-errinjct");
 	ibm_errinjct_token = rtas_token("ibm,errinjct");
-	rtas_syscall_filter_init();
 }
 
 int __init early_init_dt_scan_rtas(unsigned long node,

-- 
2.39.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 08/19] powerpc/rtas: strengthen do_enter_rtas() type safety, drop inline
  2023-02-06 18:54 ` Nathan Lynch via B4 Submission Endpoint
@ 2023-02-06 18:54   ` Nathan Lynch via B4 Submission Endpoint
  -1 siblings, 0 replies; 50+ messages in thread
From: Nathan Lynch @ 2023-02-06 18:54 UTC (permalink / raw)
  To: Michael Ellerman, Nicholas Piggin, Christophe Leroy, Kajol Jain,
	Laurent Dufour, Mahesh J Salgaonkar, Andrew Donnellan,
	Nick Child
  Cc: linuxppc-dev, Nathan Lynch

Make do_enter_rtas() take a pointer to struct rtas_args and do the
__pa() conversion in one place instead of leaving it to callers. This
also makes it possible to introduce enter/exit tracepoints that access
the rtas_args struct fields.

There's no apparent reason to force inlining of do_enter_rtas()
either, and it seems to bloat the code a bit. Let the compiler decide.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/rtas.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index 2804382c74b1..52c1ed7869b8 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -523,7 +523,7 @@ static const struct rtas_function *rtas_token_to_function(s32 token)
 /* This is here deliberately so it's only used in this file */
 void enter_rtas(unsigned long);
 
-static inline void do_enter_rtas(unsigned long args)
+static void do_enter_rtas(struct rtas_args *args)
 {
 	unsigned long msr;
 
@@ -538,7 +538,7 @@ static inline void do_enter_rtas(unsigned long args)
 
 	hard_irq_disable(); /* Ensure MSR[EE] is disabled on PPC64 */
 
-	enter_rtas(args);
+	enter_rtas(__pa(args));
 
 	srr_regs_clobbered(); /* rtas uses SRRs, invalidate */
 }
@@ -892,7 +892,7 @@ static char *__fetch_rtas_last_error(char *altbuf)
 	save_args = rtas_args;
 	rtas_args = err_args;
 
-	do_enter_rtas(__pa(&rtas_args));
+	do_enter_rtas(&rtas_args);
 
 	err_args = rtas_args;
 	rtas_args = save_args;
@@ -939,7 +939,7 @@ va_rtas_call_unlocked(struct rtas_args *args, int token, int nargs, int nret,
 	for (i = 0; i < nret; ++i)
 		args->rets[i] = 0;
 
-	do_enter_rtas(__pa(args));
+	do_enter_rtas(args);
 }
 
 void rtas_call_unlocked(struct rtas_args *args, int token, int nargs, int nret, ...)
@@ -1756,7 +1756,7 @@ SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs)
 	raw_spin_lock_irqsave(&rtas_lock, flags);
 
 	rtas_args = args;
-	do_enter_rtas(__pa(&rtas_args));
+	do_enter_rtas(&rtas_args);
 	args = rtas_args;
 
 	/* A -1 return code indicates that the last command couldn't

-- 
2.39.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 08/19] powerpc/rtas: strengthen do_enter_rtas() type safety, drop inline
@ 2023-02-06 18:54   ` Nathan Lynch via B4 Submission Endpoint
  0 siblings, 0 replies; 50+ messages in thread
From: Nathan Lynch via B4 Submission Endpoint @ 2023-02-06 18:54 UTC (permalink / raw)
  To: Michael Ellerman, Nicholas Piggin, Christophe Leroy, Kajol Jain,
	Laurent Dufour, Mahesh J Salgaonkar, Andrew Donnellan,
	Nick Child
  Cc: Nathan Lynch, linuxppc-dev

From: Nathan Lynch <nathanl@linux.ibm.com>

Make do_enter_rtas() take a pointer to struct rtas_args and do the
__pa() conversion in one place instead of leaving it to callers. This
also makes it possible to introduce enter/exit tracepoints that access
the rtas_args struct fields.

There's no apparent reason to force inlining of do_enter_rtas()
either, and it seems to bloat the code a bit. Let the compiler decide.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/rtas.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index 2804382c74b1..52c1ed7869b8 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -523,7 +523,7 @@ static const struct rtas_function *rtas_token_to_function(s32 token)
 /* This is here deliberately so it's only used in this file */
 void enter_rtas(unsigned long);
 
-static inline void do_enter_rtas(unsigned long args)
+static void do_enter_rtas(struct rtas_args *args)
 {
 	unsigned long msr;
 
@@ -538,7 +538,7 @@ static inline void do_enter_rtas(unsigned long args)
 
 	hard_irq_disable(); /* Ensure MSR[EE] is disabled on PPC64 */
 
-	enter_rtas(args);
+	enter_rtas(__pa(args));
 
 	srr_regs_clobbered(); /* rtas uses SRRs, invalidate */
 }
@@ -892,7 +892,7 @@ static char *__fetch_rtas_last_error(char *altbuf)
 	save_args = rtas_args;
 	rtas_args = err_args;
 
-	do_enter_rtas(__pa(&rtas_args));
+	do_enter_rtas(&rtas_args);
 
 	err_args = rtas_args;
 	rtas_args = save_args;
@@ -939,7 +939,7 @@ va_rtas_call_unlocked(struct rtas_args *args, int token, int nargs, int nret,
 	for (i = 0; i < nret; ++i)
 		args->rets[i] = 0;
 
-	do_enter_rtas(__pa(args));
+	do_enter_rtas(args);
 }
 
 void rtas_call_unlocked(struct rtas_args *args, int token, int nargs, int nret, ...)
@@ -1756,7 +1756,7 @@ SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs)
 	raw_spin_lock_irqsave(&rtas_lock, flags);
 
 	rtas_args = args;
-	do_enter_rtas(__pa(&rtas_args));
+	do_enter_rtas(&rtas_args);
 	args = rtas_args;
 
 	/* A -1 return code indicates that the last command couldn't

-- 
2.39.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 09/19] powerpc/tracing: tracepoints for RTAS entry and exit
  2023-02-06 18:54 ` Nathan Lynch via B4 Submission Endpoint
@ 2023-02-06 18:54   ` Nathan Lynch via B4 Submission Endpoint
  -1 siblings, 0 replies; 50+ messages in thread
From: Nathan Lynch @ 2023-02-06 18:54 UTC (permalink / raw)
  To: Michael Ellerman, Nicholas Piggin, Christophe Leroy, Kajol Jain,
	Laurent Dufour, Mahesh J Salgaonkar, Andrew Donnellan,
	Nick Child
  Cc: linuxppc-dev, Nathan Lynch

Add two sets of tracepoints to be used around RTAS entry:

* rtas_input/rtas_output, which emit the function name, its inputs,
  the returned status, and any other outputs. These produce an API-level
  record of OS<->RTAS activity.

* rtas_ll_entry/rtas_ll_exit, which are lower-level and emit the
  entire contents of the parameter block (aka rtas_args) on entry and
  exit. Likely useful only for debugging.

With uses of these tracepoints in do_enter_rtas() to be added in the
following patch, examples of get-time-of-day and event-scan functions
as rendered by trace-cmd (with some multi-line formatting manually
imposed on the rtas_ll_* entries to avoid extremely long lines in the
commit message):

cat-36800 [059]  4978.518303: rtas_input:           get-time-of-day arguments:
cat-36800 [059]  4978.518306: rtas_ll_entry:        token=3 nargs=0 nret=8
                                                    params: [0]=0x00000000 [1]=0x00000000 [2]=0x00000000 [3]=0x00000000
                                                            [4]=0x00000000 [5]=0x00000000 [6]=0x00000000 [7]=0x00000000
							    [8]=0x00000000 [9]=0x00000000 [10]=0x00000000 [11]=0x00000000
							    [12]=0x00000000 [13]=0x00000000 [14]=0x00000000 [15]=0x00000000
cat-36800 [059]  4978.518366: rtas_ll_exit:         token=3 nargs=0 nret=8
                                                    params: [0]=0x00000000 [1]=0x000007e6 [2]=0x0000000b [3]=0x00000001
						            [4]=0x00000000 [5]=0x0000000e [6]=0x00000008 [7]=0x2e0dac40
							    [8]=0x00000000 [9]=0x00000000 [10]=0x00000000 [11]=0x00000000
							    [12]=0x00000000 [13]=0x00000000 [14]=0x00000000 [15]=0x00000000
cat-36800 [059]  4978.518366: rtas_output:          get-time-of-day status: 0, other outputs: 2022 11 1 0 14 8 772648000

kworker/39:1-336   [039]  4982.731623: rtas_input:           event-scan arguments: 4294967295 0 80484920 2048
kworker/39:1-336   [039]  4982.731626: rtas_ll_entry:        token=6 nargs=4 nret=1
                                                             params: [0]=0xffffffff [1]=0x00000000 [2]=0x04cc1a38 [3]=0x00000800
							             [4]=0x00000000 [5]=0x0000000e [6]=0x00000008 [7]=0x2e0dac40
								     [8]=0x00000000 [9]=0x00000000 [10]=0x00000000 [11]=0x00000000
								     [12]=0x00000000 [13]=0x00000000 [14]=0x00000000 [15]=0x00000000
kworker/39:1-336   [039]  4982.731676: rtas_ll_exit:         token=6 nargs=4 nret=1
                                                             params: [0]=0xffffffff [1]=0x00000000 [2]=0x04cc1a38 [3]=0x00000800
							             [4]=0x00000001 [5]=0x0000000e [6]=0x00000008 [7]=0x2e0dac40
								     [8]=0x00000000 [9]=0x00000000 [10]=0x00000000 [11]=0x00000000
								     [12]=0x00000000 [13]=0x00000000 [14]=0x00000000 [15]=0x00000000
kworker/39:1-336   [039]  4982.731677: rtas_output:          event-scan status: 1, other outputs:

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
---
 arch/powerpc/include/asm/trace.h | 103 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 103 insertions(+)

diff --git a/arch/powerpc/include/asm/trace.h b/arch/powerpc/include/asm/trace.h
index 08cd60cd70b7..82cc2c6704e6 100644
--- a/arch/powerpc/include/asm/trace.h
+++ b/arch/powerpc/include/asm/trace.h
@@ -119,6 +119,109 @@ TRACE_EVENT_FN_COND(hcall_exit,
 );
 #endif
 
+#ifdef CONFIG_PPC_RTAS
+
+#include <asm/rtas-types.h>
+
+TRACE_EVENT(rtas_input,
+
+	TP_PROTO(struct rtas_args *rtas_args, const char *name),
+
+	TP_ARGS(rtas_args, name),
+
+	TP_STRUCT__entry(
+		__field(__u32, nargs)
+		__string(name, name)
+		__dynamic_array(__u32, inputs, be32_to_cpu(rtas_args->nargs))
+	),
+
+	TP_fast_assign(
+		__entry->nargs = be32_to_cpu(rtas_args->nargs);
+		__assign_str(name, name);
+		be32_to_cpu_array(__get_dynamic_array(inputs), rtas_args->args, __entry->nargs);
+	),
+
+	TP_printk("%s arguments: %s", __get_str(name),
+		  __print_array(__get_dynamic_array(inputs), __entry->nargs, 4)
+	)
+);
+
+TRACE_EVENT(rtas_output,
+
+	TP_PROTO(struct rtas_args *rtas_args, const char *name),
+
+	TP_ARGS(rtas_args, name),
+
+	TP_STRUCT__entry(
+		__field(__u32, nr_other)
+		__field(__s32, status)
+		__string(name, name)
+		__dynamic_array(__u32, other_outputs, be32_to_cpu(rtas_args->nret) - 1)
+	),
+
+	TP_fast_assign(
+		__entry->nr_other = be32_to_cpu(rtas_args->nret) - 1;
+		__entry->status = be32_to_cpu(rtas_args->rets[0]);
+		__assign_str(name, name);
+		be32_to_cpu_array(__get_dynamic_array(other_outputs),
+				  &rtas_args->rets[1], __entry->nr_other);
+	),
+
+	TP_printk("%s status: %d, other outputs: %s", __get_str(name), __entry->status,
+		  __print_array(__get_dynamic_array(other_outputs),
+				__entry->nr_other, 4)
+	)
+);
+
+DECLARE_EVENT_CLASS(rtas_parameter_block,
+
+	TP_PROTO(struct rtas_args *rtas_args),
+
+	TP_ARGS(rtas_args),
+
+	TP_STRUCT__entry(
+		__field(u32, token)
+		__field(u32, nargs)
+		__field(u32, nret)
+		__array(__u32, params, 16)
+	),
+
+	TP_fast_assign(
+		__entry->token = be32_to_cpu(rtas_args->token);
+		__entry->nargs = be32_to_cpu(rtas_args->nargs);
+		__entry->nret = be32_to_cpu(rtas_args->nret);
+		be32_to_cpu_array(__entry->params, rtas_args->args, ARRAY_SIZE(rtas_args->args));
+	),
+
+	TP_printk("token=%u nargs=%u nret=%u params:"
+		  " [0]=0x%08x [1]=0x%08x [2]=0x%08x [3]=0x%08x"
+		  " [4]=0x%08x [5]=0x%08x [6]=0x%08x [7]=0x%08x"
+		  " [8]=0x%08x [9]=0x%08x [10]=0x%08x [11]=0x%08x"
+		  " [12]=0x%08x [13]=0x%08x [14]=0x%08x [15]=0x%08x",
+		  __entry->token, __entry->nargs, __entry->nret,
+		  __entry->params[0], __entry->params[1], __entry->params[2], __entry->params[3],
+		  __entry->params[4], __entry->params[5], __entry->params[6], __entry->params[7],
+		  __entry->params[8], __entry->params[9], __entry->params[10], __entry->params[11],
+		  __entry->params[12], __entry->params[13], __entry->params[14], __entry->params[15]
+	)
+);
+
+DEFINE_EVENT(rtas_parameter_block, rtas_ll_entry,
+
+	TP_PROTO(struct rtas_args *rtas_args),
+
+	TP_ARGS(rtas_args)
+);
+
+DEFINE_EVENT(rtas_parameter_block, rtas_ll_exit,
+
+	TP_PROTO(struct rtas_args *rtas_args),
+
+	TP_ARGS(rtas_args)
+);
+
+#endif /* CONFIG_PPC_RTAS */
+
 #ifdef CONFIG_PPC_POWERNV
 extern int opal_tracepoint_regfunc(void);
 extern void opal_tracepoint_unregfunc(void);

-- 
2.39.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 09/19] powerpc/tracing: tracepoints for RTAS entry and exit
@ 2023-02-06 18:54   ` Nathan Lynch via B4 Submission Endpoint
  0 siblings, 0 replies; 50+ messages in thread
From: Nathan Lynch via B4 Submission Endpoint @ 2023-02-06 18:54 UTC (permalink / raw)
  To: Michael Ellerman, Nicholas Piggin, Christophe Leroy, Kajol Jain,
	Laurent Dufour, Mahesh J Salgaonkar, Andrew Donnellan,
	Nick Child
  Cc: Nathan Lynch, linuxppc-dev

From: Nathan Lynch <nathanl@linux.ibm.com>

Add two sets of tracepoints to be used around RTAS entry:

* rtas_input/rtas_output, which emit the function name, its inputs,
  the returned status, and any other outputs. These produce an API-level
  record of OS<->RTAS activity.

* rtas_ll_entry/rtas_ll_exit, which are lower-level and emit the
  entire contents of the parameter block (aka rtas_args) on entry and
  exit. Likely useful only for debugging.

With uses of these tracepoints in do_enter_rtas() to be added in the
following patch, examples of get-time-of-day and event-scan functions
as rendered by trace-cmd (with some multi-line formatting manually
imposed on the rtas_ll_* entries to avoid extremely long lines in the
commit message):

cat-36800 [059]  4978.518303: rtas_input:           get-time-of-day arguments:
cat-36800 [059]  4978.518306: rtas_ll_entry:        token=3 nargs=0 nret=8
                                                    params: [0]=0x00000000 [1]=0x00000000 [2]=0x00000000 [3]=0x00000000
                                                            [4]=0x00000000 [5]=0x00000000 [6]=0x00000000 [7]=0x00000000
							    [8]=0x00000000 [9]=0x00000000 [10]=0x00000000 [11]=0x00000000
							    [12]=0x00000000 [13]=0x00000000 [14]=0x00000000 [15]=0x00000000
cat-36800 [059]  4978.518366: rtas_ll_exit:         token=3 nargs=0 nret=8
                                                    params: [0]=0x00000000 [1]=0x000007e6 [2]=0x0000000b [3]=0x00000001
						            [4]=0x00000000 [5]=0x0000000e [6]=0x00000008 [7]=0x2e0dac40
							    [8]=0x00000000 [9]=0x00000000 [10]=0x00000000 [11]=0x00000000
							    [12]=0x00000000 [13]=0x00000000 [14]=0x00000000 [15]=0x00000000
cat-36800 [059]  4978.518366: rtas_output:          get-time-of-day status: 0, other outputs: 2022 11 1 0 14 8 772648000

kworker/39:1-336   [039]  4982.731623: rtas_input:           event-scan arguments: 4294967295 0 80484920 2048
kworker/39:1-336   [039]  4982.731626: rtas_ll_entry:        token=6 nargs=4 nret=1
                                                             params: [0]=0xffffffff [1]=0x00000000 [2]=0x04cc1a38 [3]=0x00000800
							             [4]=0x00000000 [5]=0x0000000e [6]=0x00000008 [7]=0x2e0dac40
								     [8]=0x00000000 [9]=0x00000000 [10]=0x00000000 [11]=0x00000000
								     [12]=0x00000000 [13]=0x00000000 [14]=0x00000000 [15]=0x00000000
kworker/39:1-336   [039]  4982.731676: rtas_ll_exit:         token=6 nargs=4 nret=1
                                                             params: [0]=0xffffffff [1]=0x00000000 [2]=0x04cc1a38 [3]=0x00000800
							             [4]=0x00000001 [5]=0x0000000e [6]=0x00000008 [7]=0x2e0dac40
								     [8]=0x00000000 [9]=0x00000000 [10]=0x00000000 [11]=0x00000000
								     [12]=0x00000000 [13]=0x00000000 [14]=0x00000000 [15]=0x00000000
kworker/39:1-336   [039]  4982.731677: rtas_output:          event-scan status: 1, other outputs:

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
---
 arch/powerpc/include/asm/trace.h | 103 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 103 insertions(+)

diff --git a/arch/powerpc/include/asm/trace.h b/arch/powerpc/include/asm/trace.h
index 08cd60cd70b7..82cc2c6704e6 100644
--- a/arch/powerpc/include/asm/trace.h
+++ b/arch/powerpc/include/asm/trace.h
@@ -119,6 +119,109 @@ TRACE_EVENT_FN_COND(hcall_exit,
 );
 #endif
 
+#ifdef CONFIG_PPC_RTAS
+
+#include <asm/rtas-types.h>
+
+TRACE_EVENT(rtas_input,
+
+	TP_PROTO(struct rtas_args *rtas_args, const char *name),
+
+	TP_ARGS(rtas_args, name),
+
+	TP_STRUCT__entry(
+		__field(__u32, nargs)
+		__string(name, name)
+		__dynamic_array(__u32, inputs, be32_to_cpu(rtas_args->nargs))
+	),
+
+	TP_fast_assign(
+		__entry->nargs = be32_to_cpu(rtas_args->nargs);
+		__assign_str(name, name);
+		be32_to_cpu_array(__get_dynamic_array(inputs), rtas_args->args, __entry->nargs);
+	),
+
+	TP_printk("%s arguments: %s", __get_str(name),
+		  __print_array(__get_dynamic_array(inputs), __entry->nargs, 4)
+	)
+);
+
+TRACE_EVENT(rtas_output,
+
+	TP_PROTO(struct rtas_args *rtas_args, const char *name),
+
+	TP_ARGS(rtas_args, name),
+
+	TP_STRUCT__entry(
+		__field(__u32, nr_other)
+		__field(__s32, status)
+		__string(name, name)
+		__dynamic_array(__u32, other_outputs, be32_to_cpu(rtas_args->nret) - 1)
+	),
+
+	TP_fast_assign(
+		__entry->nr_other = be32_to_cpu(rtas_args->nret) - 1;
+		__entry->status = be32_to_cpu(rtas_args->rets[0]);
+		__assign_str(name, name);
+		be32_to_cpu_array(__get_dynamic_array(other_outputs),
+				  &rtas_args->rets[1], __entry->nr_other);
+	),
+
+	TP_printk("%s status: %d, other outputs: %s", __get_str(name), __entry->status,
+		  __print_array(__get_dynamic_array(other_outputs),
+				__entry->nr_other, 4)
+	)
+);
+
+DECLARE_EVENT_CLASS(rtas_parameter_block,
+
+	TP_PROTO(struct rtas_args *rtas_args),
+
+	TP_ARGS(rtas_args),
+
+	TP_STRUCT__entry(
+		__field(u32, token)
+		__field(u32, nargs)
+		__field(u32, nret)
+		__array(__u32, params, 16)
+	),
+
+	TP_fast_assign(
+		__entry->token = be32_to_cpu(rtas_args->token);
+		__entry->nargs = be32_to_cpu(rtas_args->nargs);
+		__entry->nret = be32_to_cpu(rtas_args->nret);
+		be32_to_cpu_array(__entry->params, rtas_args->args, ARRAY_SIZE(rtas_args->args));
+	),
+
+	TP_printk("token=%u nargs=%u nret=%u params:"
+		  " [0]=0x%08x [1]=0x%08x [2]=0x%08x [3]=0x%08x"
+		  " [4]=0x%08x [5]=0x%08x [6]=0x%08x [7]=0x%08x"
+		  " [8]=0x%08x [9]=0x%08x [10]=0x%08x [11]=0x%08x"
+		  " [12]=0x%08x [13]=0x%08x [14]=0x%08x [15]=0x%08x",
+		  __entry->token, __entry->nargs, __entry->nret,
+		  __entry->params[0], __entry->params[1], __entry->params[2], __entry->params[3],
+		  __entry->params[4], __entry->params[5], __entry->params[6], __entry->params[7],
+		  __entry->params[8], __entry->params[9], __entry->params[10], __entry->params[11],
+		  __entry->params[12], __entry->params[13], __entry->params[14], __entry->params[15]
+	)
+);
+
+DEFINE_EVENT(rtas_parameter_block, rtas_ll_entry,
+
+	TP_PROTO(struct rtas_args *rtas_args),
+
+	TP_ARGS(rtas_args)
+);
+
+DEFINE_EVENT(rtas_parameter_block, rtas_ll_exit,
+
+	TP_PROTO(struct rtas_args *rtas_args),
+
+	TP_ARGS(rtas_args)
+);
+
+#endif /* CONFIG_PPC_RTAS */
+
 #ifdef CONFIG_PPC_POWERNV
 extern int opal_tracepoint_regfunc(void);
 extern void opal_tracepoint_unregfunc(void);

-- 
2.39.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 10/19] powerpc/rtas: add tracepoints around RTAS entry
  2023-02-06 18:54 ` Nathan Lynch via B4 Submission Endpoint
@ 2023-02-06 18:54   ` Nathan Lynch via B4 Submission Endpoint
  -1 siblings, 0 replies; 50+ messages in thread
From: Nathan Lynch @ 2023-02-06 18:54 UTC (permalink / raw)
  To: Michael Ellerman, Nicholas Piggin, Christophe Leroy, Kajol Jain,
	Laurent Dufour, Mahesh J Salgaonkar, Andrew Donnellan,
	Nick Child
  Cc: linuxppc-dev, Nathan Lynch

Decompose the RTAS entry C code into tracing and non-tracing variants,
calling the just-added tracepoints in the tracing-enabled path. Skip
tracing in contexts known to be unsafe (real mode, CPU offline).

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
---
 arch/powerpc/kernel/rtas.c | 59 +++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 53 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index 52c1ed7869b8..3290f25b9b34 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -38,6 +38,7 @@
 #include <asm/page.h>
 #include <asm/rtas.h>
 #include <asm/time.h>
+#include <asm/trace.h>
 #include <asm/udbg.h>
 
 struct rtas_filter {
@@ -523,24 +524,70 @@ static const struct rtas_function *rtas_token_to_function(s32 token)
 /* This is here deliberately so it's only used in this file */
 void enter_rtas(unsigned long);
 
+static void __do_enter_rtas(struct rtas_args *args)
+{
+	enter_rtas(__pa(args));
+	srr_regs_clobbered(); /* rtas uses SRRs, invalidate */
+}
+
+static void __do_enter_rtas_trace(struct rtas_args *args)
+{
+	const char *name = NULL;
+	/*
+	 * If the tracepoints that consume the function name aren't
+	 * active, avoid the lookup.
+	 */
+	if ((trace_rtas_input_enabled() || trace_rtas_output_enabled())) {
+		const s32 token = be32_to_cpu(args->token);
+		const struct rtas_function *func = rtas_token_to_function(token);
+
+		name = func->name;
+	}
+
+	trace_rtas_input(args, name);
+	trace_rtas_ll_entry(args);
+
+	__do_enter_rtas(args);
+
+	trace_rtas_ll_exit(args);
+	trace_rtas_output(args, name);
+}
+
 static void do_enter_rtas(struct rtas_args *args)
 {
-	unsigned long msr;
-
+	const unsigned long msr = mfmsr();
+	/*
+	 * Situations where we want to skip any active tracepoints for
+	 * safety reasons:
+	 *
+	 * 1. The last code executed on an offline CPU as it stops,
+	 *    i.e. we're about to call stop-self. The tracepoints'
+	 *    function name lookup uses xarray, which uses RCU, which
+	 *    isn't valid to call on an offline CPU.  Any events
+	 *    emitted on an offline CPU will be discarded anyway.
+	 *
+	 * 2. In real mode, as when invoking ibm,nmi-interlock from
+	 *    the pseries MCE handler. We cannot count on trace
+	 *    buffers or the entries in rtas_token_to_function_xarray
+	 *    to be contained in the RMO.
+	 */
+	const unsigned long mask = MSR_IR | MSR_DR;
+	const bool can_trace = likely(cpu_online(raw_smp_processor_id()) &&
+				      (msr & mask) == mask);
 	/*
 	 * Make sure MSR[RI] is currently enabled as it will be forced later
 	 * in enter_rtas.
 	 */
-	msr = mfmsr();
 	BUG_ON(!(msr & MSR_RI));
 
 	BUG_ON(!irqs_disabled());
 
 	hard_irq_disable(); /* Ensure MSR[EE] is disabled on PPC64 */
 
-	enter_rtas(__pa(args));
-
-	srr_regs_clobbered(); /* rtas uses SRRs, invalidate */
+	if (can_trace)
+		__do_enter_rtas_trace(args);
+	else
+		__do_enter_rtas(args);
 }
 
 struct rtas_t rtas;

-- 
2.39.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 10/19] powerpc/rtas: add tracepoints around RTAS entry
@ 2023-02-06 18:54   ` Nathan Lynch via B4 Submission Endpoint
  0 siblings, 0 replies; 50+ messages in thread
From: Nathan Lynch via B4 Submission Endpoint @ 2023-02-06 18:54 UTC (permalink / raw)
  To: Michael Ellerman, Nicholas Piggin, Christophe Leroy, Kajol Jain,
	Laurent Dufour, Mahesh J Salgaonkar, Andrew Donnellan,
	Nick Child
  Cc: Nathan Lynch, linuxppc-dev

From: Nathan Lynch <nathanl@linux.ibm.com>

Decompose the RTAS entry C code into tracing and non-tracing variants,
calling the just-added tracepoints in the tracing-enabled path. Skip
tracing in contexts known to be unsafe (real mode, CPU offline).

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
---
 arch/powerpc/kernel/rtas.c | 59 +++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 53 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index 52c1ed7869b8..3290f25b9b34 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -38,6 +38,7 @@
 #include <asm/page.h>
 #include <asm/rtas.h>
 #include <asm/time.h>
+#include <asm/trace.h>
 #include <asm/udbg.h>
 
 struct rtas_filter {
@@ -523,24 +524,70 @@ static const struct rtas_function *rtas_token_to_function(s32 token)
 /* This is here deliberately so it's only used in this file */
 void enter_rtas(unsigned long);
 
+static void __do_enter_rtas(struct rtas_args *args)
+{
+	enter_rtas(__pa(args));
+	srr_regs_clobbered(); /* rtas uses SRRs, invalidate */
+}
+
+static void __do_enter_rtas_trace(struct rtas_args *args)
+{
+	const char *name = NULL;
+	/*
+	 * If the tracepoints that consume the function name aren't
+	 * active, avoid the lookup.
+	 */
+	if ((trace_rtas_input_enabled() || trace_rtas_output_enabled())) {
+		const s32 token = be32_to_cpu(args->token);
+		const struct rtas_function *func = rtas_token_to_function(token);
+
+		name = func->name;
+	}
+
+	trace_rtas_input(args, name);
+	trace_rtas_ll_entry(args);
+
+	__do_enter_rtas(args);
+
+	trace_rtas_ll_exit(args);
+	trace_rtas_output(args, name);
+}
+
 static void do_enter_rtas(struct rtas_args *args)
 {
-	unsigned long msr;
-
+	const unsigned long msr = mfmsr();
+	/*
+	 * Situations where we want to skip any active tracepoints for
+	 * safety reasons:
+	 *
+	 * 1. The last code executed on an offline CPU as it stops,
+	 *    i.e. we're about to call stop-self. The tracepoints'
+	 *    function name lookup uses xarray, which uses RCU, which
+	 *    isn't valid to call on an offline CPU.  Any events
+	 *    emitted on an offline CPU will be discarded anyway.
+	 *
+	 * 2. In real mode, as when invoking ibm,nmi-interlock from
+	 *    the pseries MCE handler. We cannot count on trace
+	 *    buffers or the entries in rtas_token_to_function_xarray
+	 *    to be contained in the RMO.
+	 */
+	const unsigned long mask = MSR_IR | MSR_DR;
+	const bool can_trace = likely(cpu_online(raw_smp_processor_id()) &&
+				      (msr & mask) == mask);
 	/*
 	 * Make sure MSR[RI] is currently enabled as it will be forced later
 	 * in enter_rtas.
 	 */
-	msr = mfmsr();
 	BUG_ON(!(msr & MSR_RI));
 
 	BUG_ON(!irqs_disabled());
 
 	hard_irq_disable(); /* Ensure MSR[EE] is disabled on PPC64 */
 
-	enter_rtas(__pa(args));
-
-	srr_regs_clobbered(); /* rtas uses SRRs, invalidate */
+	if (can_trace)
+		__do_enter_rtas_trace(args);
+	else
+		__do_enter_rtas(args);
 }
 
 struct rtas_t rtas;

-- 
2.39.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 11/19] powerpc/rtas: add work area allocator
  2023-02-06 18:54 ` Nathan Lynch via B4 Submission Endpoint
@ 2023-02-06 18:54   ` Nathan Lynch via B4 Submission Endpoint
  -1 siblings, 0 replies; 50+ messages in thread
From: Nathan Lynch @ 2023-02-06 18:54 UTC (permalink / raw)
  To: Michael Ellerman, Nicholas Piggin, Christophe Leroy, Kajol Jain,
	Laurent Dufour, Mahesh J Salgaonkar, Andrew Donnellan,
	Nick Child
  Cc: linuxppc-dev, Nathan Lynch

Most callers of RTAS functions that take a temporary "work area"
parameter use the statically allocated rtas_data_buf buffer as the
argument. This buffer is protected by a global spinlock. So users of
rtas_data_buf cannot perform sleeping operations while accessing the
buffer.

Most RTAS functions that have a work area parameter can return a
status (-2/990x) that indicates that the caller should retry. Before
retrying, the caller may need to reschedule or sleep (see
rtas_busy_delay() for details). This combination of factors
necessitates uncomfortable constructions like this:

	do {
		spin_lock(&rtas_data_buf_lock);
		rc = rtas_call(token, __pa(rtas_data_buf, ...);
		if (rc == 0) {
			/* parse or copy out rtas_data_buf contents */
		}
		spin_unlock(&rtas_data_buf_lock);
	} while (rtas_busy_delay(rc));

Another unfortunately common way of handling this is for callers to
blithely ignore the possibility of a -2/990x status and hope for the
best.

If users were allowed to perform blocking operations while owning a
work area, the programming model would become less tedious and
error-prone. Users could schedule away, sleep, or perform other
blocking operations without having to release and re-acquire
resources.

We could continue to use a single work area buffer, and convert
rtas_data_buf_lock to a mutex. But that would impose an unnecessarily
coarse serialization on all users. As awkward as the current design
is, it prevents longer running operations that need to repeatedly use
rtas_data_buf from blocking the progress of others.

There are more considerations. One is that while 4KB is fine for all
current in-kernel uses, some RTAS calls can take much smaller buffers,
and some (VPD, platform dumps) would likely benefit from larger
ones. Another is that at least one RTAS function (ibm,get-vpd)
has *two* work area parameters. And finally, we should expect the
number of work area users in the kernel to increase over time as we
introduce lockdown-compatible ABIs to replace less safe use cases
based on sys_rtas/librtas.

So a special-purpose allocator for RTAS work area buffers seems worth
trying.

Properties:

* The backing memory for the allocator is reserved early in boot in
  order to satisfy RTAS addressing requirements, and then managed with
  genalloc.
* Allocations can block, but they never fail (mempool-like).
* Prioritizes first-come, first-serve fairness over throughput.
* Early boot allocations before the allocator has been initialized are
  served via an internal static buffer.

Intended to replace rtas_data_buf. New code that needs RTAS work area
buffers should prefer this API.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
---
 arch/powerpc/include/asm/rtas-work-area.h |  45 +++++++
 arch/powerpc/kernel/Makefile              |   3 +-
 arch/powerpc/kernel/rtas-work-area.c      | 208 ++++++++++++++++++++++++++++++
 arch/powerpc/kernel/rtas.c                |   3 +
 4 files changed, 258 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/rtas-work-area.h b/arch/powerpc/include/asm/rtas-work-area.h
new file mode 100644
index 000000000000..76ccb039cc37
--- /dev/null
+++ b/arch/powerpc/include/asm/rtas-work-area.h
@@ -0,0 +1,45 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef POWERPC_RTAS_WORK_AREA_H
+#define POWERPC_RTAS_WORK_AREA_H
+
+#include <linux/types.h>
+
+#include <asm/page.h>
+
+/**
+ * struct rtas_work_area - RTAS work area descriptor.
+ *
+ * Descriptor for a "work area" in PAPR terminology that satisfies
+ * RTAS addressing requirements.
+ */
+struct rtas_work_area {
+	/* private: Use the APIs provided below. */
+	char *buf;
+	size_t size;
+};
+
+struct rtas_work_area *rtas_work_area_alloc(size_t size);
+void rtas_work_area_free(struct rtas_work_area *area);
+
+static inline char *rtas_work_area_raw_buf(const struct rtas_work_area *area)
+{
+	return area->buf;
+}
+
+static inline size_t rtas_work_area_size(const struct rtas_work_area *area)
+{
+	return area->size;
+}
+
+static inline phys_addr_t rtas_work_area_phys(const struct rtas_work_area *area)
+{
+	return __pa(area->buf);
+}
+
+/*
+ * Early setup for the work area allocator. Call from
+ * rtas_initialize() only.
+ */
+int rtas_work_area_reserve_arena(phys_addr_t);
+
+#endif /* POWERPC_RTAS_WORK_AREA_H */
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index 9b6146056e48..69e652e319a4 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -90,7 +90,8 @@ obj-$(CONFIG_PPC_BOOK3S_IDLE)	+= idle_book3s.o
 procfs-y			:= proc_powerpc.o
 obj-$(CONFIG_PROC_FS)		+= $(procfs-y)
 rtaspci-$(CONFIG_PPC64)-$(CONFIG_PCI)	:= rtas_pci.o
-obj-$(CONFIG_PPC_RTAS)		+= rtas_entry.o rtas.o rtas-rtc.o $(rtaspci-y-y)
+obj-$(CONFIG_PPC_RTAS)		+= rtas_entry.o rtas.o rtas-rtc.o $(rtaspci-y-y) \
+                                   rtas-work-area.o
 obj-$(CONFIG_PPC_RTAS_DAEMON)	+= rtasd.o
 obj-$(CONFIG_RTAS_FLASH)	+= rtas_flash.o
 obj-$(CONFIG_RTAS_PROC)		+= rtas-proc.o
diff --git a/arch/powerpc/kernel/rtas-work-area.c b/arch/powerpc/kernel/rtas-work-area.c
new file mode 100644
index 000000000000..75950e13a0fe
--- /dev/null
+++ b/arch/powerpc/kernel/rtas-work-area.c
@@ -0,0 +1,208 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#define pr_fmt(fmt)	"rtas-work-area: " fmt
+
+#include <linux/genalloc.h>
+#include <linux/log2.h>
+#include <linux/kernel.h>
+#include <linux/memblock.h>
+#include <linux/mempool.h>
+#include <linux/minmax.h>
+#include <linux/mutex.h>
+#include <linux/numa.h>
+#include <linux/sizes.h>
+#include <linux/wait.h>
+
+#include <asm/machdep.h>
+#include <asm/rtas-work-area.h>
+
+enum {
+	/*
+	 * Ensure the pool is page-aligned.
+	 */
+	RTAS_WORK_AREA_ARENA_ALIGN = PAGE_SIZE,
+
+	RTAS_WORK_AREA_ARENA_SZ = SZ_256K,
+	/*
+	 * The smallest known work area size is for ibm,get-vpd's
+	 * location code argument, which is limited to 79 characters
+	 * plus 1 nul terminator.
+	 *
+	 * PAPR+ 7.3.20 ibm,get-vpd RTAS Call
+	 * PAPR+ 12.3.2.4 Converged Location Code Rules - Length Restrictions
+	 */
+	RTAS_WORK_AREA_MIN_ALLOC_SZ = roundup_pow_of_two(80),
+	/*
+	 * Don't let a single allocation claim the whole arena.
+	 */
+	RTAS_WORK_AREA_MAX_ALLOC_SZ = RTAS_WORK_AREA_ARENA_SZ / 2,
+};
+
+static struct rtas_work_area_allocator_state {
+	struct gen_pool *gen_pool;
+	char *arena;
+	struct mutex mutex; /* serializes allocations */
+	struct wait_queue_head wqh;
+	mempool_t descriptor_pool;
+	bool available;
+} rwa_state_ = {
+	.mutex = __MUTEX_INITIALIZER(rwa_state_.mutex),
+	.wqh = __WAIT_QUEUE_HEAD_INITIALIZER(rwa_state_.wqh),
+};
+static struct rtas_work_area_allocator_state *rwa_state = &rwa_state_;
+
+/*
+ * A single work area buffer and descriptor to serve requests early in
+ * boot before the allocator is fully initialized.
+ */
+static bool early_work_area_in_use __initdata;
+static char early_work_area_buf[SZ_4K] __initdata;
+static struct rtas_work_area early_work_area __initdata = {
+	.buf = early_work_area_buf,
+	.size = sizeof(early_work_area_buf),
+};
+
+
+static struct rtas_work_area * __init rtas_work_area_alloc_early(size_t size)
+{
+	WARN_ON(size > early_work_area.size);
+	WARN_ON(early_work_area_in_use);
+	early_work_area_in_use = true;
+	memset(early_work_area.buf, 0, early_work_area.size);
+	return &early_work_area;
+}
+
+static void __init rtas_work_area_free_early(struct rtas_work_area *work_area)
+{
+	WARN_ON(work_area != &early_work_area);
+	WARN_ON(!early_work_area_in_use);
+	early_work_area_in_use = false;
+}
+
+struct rtas_work_area * __ref rtas_work_area_alloc(size_t size)
+{
+	struct rtas_work_area *area;
+	unsigned long addr;
+
+	might_sleep();
+
+	WARN_ON(size > RTAS_WORK_AREA_MAX_ALLOC_SZ);
+	size = min_t(size_t, size, RTAS_WORK_AREA_MAX_ALLOC_SZ);
+
+	if (!rwa_state->available) {
+		area = rtas_work_area_alloc_early(size);
+		goto out;
+	}
+
+	/*
+	 * To ensure FCFS behavior and prevent a high rate of smaller
+	 * requests from starving larger ones, use the mutex to queue
+	 * allocations.
+	 */
+	mutex_lock(&rwa_state->mutex);
+	wait_event(rwa_state->wqh,
+		   (addr = gen_pool_alloc(rwa_state->gen_pool, size)) != 0);
+	mutex_unlock(&rwa_state->mutex);
+
+	area = mempool_alloc(&rwa_state->descriptor_pool, GFP_KERNEL);
+	*area = (typeof(*area)){
+		.size = size,
+		.buf = (char *)addr,
+	};
+out:
+	pr_devel("%ps -> %s() -> buf=%p size=%zu\n",
+		 (void *)_RET_IP_, __func__, area->buf, area->size);
+
+	return area;
+}
+
+void __ref rtas_work_area_free(struct rtas_work_area *area)
+{
+	pr_devel("%ps -> %s() -> buf=%p size=%zu\n",
+		 (void *)_RET_IP_, __func__, area->buf, area->size);
+
+	if (!rwa_state->available) {
+		rtas_work_area_free_early(area);
+		return;
+	}
+
+	gen_pool_free(rwa_state->gen_pool, (unsigned long)area->buf, area->size);
+	mempool_free(area, &rwa_state->descriptor_pool);
+	wake_up(&rwa_state->wqh);
+}
+
+/*
+ * Initialization of the work area allocator happens in two parts. To
+ * reliably reserve an arena that satisfies RTAS addressing
+ * requirements, we must perform a memblock allocation early,
+ * immmediately after RTAS instantiation. Then we have to wait until
+ * the slab allocator is up before setting up the descriptor mempool
+ * and adding the arena to a gen_pool.
+ */
+static __init int rtas_work_area_allocator_init(void)
+{
+	const unsigned int order = ilog2(RTAS_WORK_AREA_MIN_ALLOC_SZ);
+	const phys_addr_t pa_start = __pa(rwa_state->arena);
+	const phys_addr_t pa_end = pa_start + RTAS_WORK_AREA_ARENA_SZ - 1;
+	struct gen_pool *pool;
+	const int nid = NUMA_NO_NODE;
+	int err;
+
+	err = -ENOMEM;
+	if (!rwa_state->arena)
+		goto err_out;
+
+	pool = gen_pool_create(order, nid);
+	if (!pool)
+		goto err_out;
+	/*
+	 * All RTAS functions that consume work areas are OK with
+	 * natural alignment, when they have alignment requirements at
+	 * all.
+	 */
+	gen_pool_set_algo(pool, gen_pool_first_fit_order_align, NULL);
+
+	err = gen_pool_add(pool, (unsigned long)rwa_state->arena,
+			   RTAS_WORK_AREA_ARENA_SZ, nid);
+	if (err)
+		goto err_destroy;
+
+	err = mempool_init_kmalloc_pool(&rwa_state->descriptor_pool, 1,
+					sizeof(struct rtas_work_area));
+	if (err)
+		goto err_destroy;
+
+	rwa_state->gen_pool = pool;
+	rwa_state->available = true;
+
+	pr_debug("arena [%pa-%pa] (%uK), min/max alloc sizes %u/%u\n",
+		 &pa_start, &pa_end,
+		 RTAS_WORK_AREA_ARENA_SZ / SZ_1K,
+		 RTAS_WORK_AREA_MIN_ALLOC_SZ,
+		 RTAS_WORK_AREA_MAX_ALLOC_SZ);
+
+	return 0;
+
+err_destroy:
+	gen_pool_destroy(pool);
+err_out:
+	return err;
+}
+machine_arch_initcall(pseries, rtas_work_area_allocator_init);
+
+/**
+ * rtas_work_area_reserve_arena() - reserve memory suitable for RTAS work areas.
+ */
+int __init rtas_work_area_reserve_arena(const phys_addr_t limit)
+{
+	const phys_addr_t align = RTAS_WORK_AREA_ARENA_ALIGN;
+	const phys_addr_t size = RTAS_WORK_AREA_ARENA_SZ;
+	const phys_addr_t min = MEMBLOCK_LOW_LIMIT;
+	const int nid = NUMA_NO_NODE;
+
+	rwa_state->arena = memblock_alloc_try_nid(size, align, min, limit, nid);
+	if (!rwa_state->arena)
+		return -ENOMEM;
+
+	return 0;
+}
diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index 3290f25b9b34..41c430dc40c2 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -36,6 +36,7 @@
 #include <asm/machdep.h>
 #include <asm/mmu.h>
 #include <asm/page.h>
+#include <asm/rtas-work-area.h>
 #include <asm/rtas.h>
 #include <asm/time.h>
 #include <asm/trace.h>
@@ -1938,6 +1939,8 @@ void __init rtas_initialize(void)
 #endif
 	ibm_open_errinjct_token = rtas_token("ibm,open-errinjct");
 	ibm_errinjct_token = rtas_token("ibm,errinjct");
+
+	rtas_work_area_reserve_arena(rtas_region);
 }
 
 int __init early_init_dt_scan_rtas(unsigned long node,

-- 
2.39.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 11/19] powerpc/rtas: add work area allocator
@ 2023-02-06 18:54   ` Nathan Lynch via B4 Submission Endpoint
  0 siblings, 0 replies; 50+ messages in thread
From: Nathan Lynch via B4 Submission Endpoint @ 2023-02-06 18:54 UTC (permalink / raw)
  To: Michael Ellerman, Nicholas Piggin, Christophe Leroy, Kajol Jain,
	Laurent Dufour, Mahesh J Salgaonkar, Andrew Donnellan,
	Nick Child
  Cc: Nathan Lynch, linuxppc-dev

From: Nathan Lynch <nathanl@linux.ibm.com>

Most callers of RTAS functions that take a temporary "work area"
parameter use the statically allocated rtas_data_buf buffer as the
argument. This buffer is protected by a global spinlock. So users of
rtas_data_buf cannot perform sleeping operations while accessing the
buffer.

Most RTAS functions that have a work area parameter can return a
status (-2/990x) that indicates that the caller should retry. Before
retrying, the caller may need to reschedule or sleep (see
rtas_busy_delay() for details). This combination of factors
necessitates uncomfortable constructions like this:

	do {
		spin_lock(&rtas_data_buf_lock);
		rc = rtas_call(token, __pa(rtas_data_buf, ...);
		if (rc == 0) {
			/* parse or copy out rtas_data_buf contents */
		}
		spin_unlock(&rtas_data_buf_lock);
	} while (rtas_busy_delay(rc));

Another unfortunately common way of handling this is for callers to
blithely ignore the possibility of a -2/990x status and hope for the
best.

If users were allowed to perform blocking operations while owning a
work area, the programming model would become less tedious and
error-prone. Users could schedule away, sleep, or perform other
blocking operations without having to release and re-acquire
resources.

We could continue to use a single work area buffer, and convert
rtas_data_buf_lock to a mutex. But that would impose an unnecessarily
coarse serialization on all users. As awkward as the current design
is, it prevents longer running operations that need to repeatedly use
rtas_data_buf from blocking the progress of others.

There are more considerations. One is that while 4KB is fine for all
current in-kernel uses, some RTAS calls can take much smaller buffers,
and some (VPD, platform dumps) would likely benefit from larger
ones. Another is that at least one RTAS function (ibm,get-vpd)
has *two* work area parameters. And finally, we should expect the
number of work area users in the kernel to increase over time as we
introduce lockdown-compatible ABIs to replace less safe use cases
based on sys_rtas/librtas.

So a special-purpose allocator for RTAS work area buffers seems worth
trying.

Properties:

* The backing memory for the allocator is reserved early in boot in
  order to satisfy RTAS addressing requirements, and then managed with
  genalloc.
* Allocations can block, but they never fail (mempool-like).
* Prioritizes first-come, first-serve fairness over throughput.
* Early boot allocations before the allocator has been initialized are
  served via an internal static buffer.

Intended to replace rtas_data_buf. New code that needs RTAS work area
buffers should prefer this API.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
---
 arch/powerpc/include/asm/rtas-work-area.h |  45 +++++++
 arch/powerpc/kernel/Makefile              |   3 +-
 arch/powerpc/kernel/rtas-work-area.c      | 208 ++++++++++++++++++++++++++++++
 arch/powerpc/kernel/rtas.c                |   3 +
 4 files changed, 258 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/rtas-work-area.h b/arch/powerpc/include/asm/rtas-work-area.h
new file mode 100644
index 000000000000..76ccb039cc37
--- /dev/null
+++ b/arch/powerpc/include/asm/rtas-work-area.h
@@ -0,0 +1,45 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef POWERPC_RTAS_WORK_AREA_H
+#define POWERPC_RTAS_WORK_AREA_H
+
+#include <linux/types.h>
+
+#include <asm/page.h>
+
+/**
+ * struct rtas_work_area - RTAS work area descriptor.
+ *
+ * Descriptor for a "work area" in PAPR terminology that satisfies
+ * RTAS addressing requirements.
+ */
+struct rtas_work_area {
+	/* private: Use the APIs provided below. */
+	char *buf;
+	size_t size;
+};
+
+struct rtas_work_area *rtas_work_area_alloc(size_t size);
+void rtas_work_area_free(struct rtas_work_area *area);
+
+static inline char *rtas_work_area_raw_buf(const struct rtas_work_area *area)
+{
+	return area->buf;
+}
+
+static inline size_t rtas_work_area_size(const struct rtas_work_area *area)
+{
+	return area->size;
+}
+
+static inline phys_addr_t rtas_work_area_phys(const struct rtas_work_area *area)
+{
+	return __pa(area->buf);
+}
+
+/*
+ * Early setup for the work area allocator. Call from
+ * rtas_initialize() only.
+ */
+int rtas_work_area_reserve_arena(phys_addr_t);
+
+#endif /* POWERPC_RTAS_WORK_AREA_H */
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index 9b6146056e48..69e652e319a4 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -90,7 +90,8 @@ obj-$(CONFIG_PPC_BOOK3S_IDLE)	+= idle_book3s.o
 procfs-y			:= proc_powerpc.o
 obj-$(CONFIG_PROC_FS)		+= $(procfs-y)
 rtaspci-$(CONFIG_PPC64)-$(CONFIG_PCI)	:= rtas_pci.o
-obj-$(CONFIG_PPC_RTAS)		+= rtas_entry.o rtas.o rtas-rtc.o $(rtaspci-y-y)
+obj-$(CONFIG_PPC_RTAS)		+= rtas_entry.o rtas.o rtas-rtc.o $(rtaspci-y-y) \
+                                   rtas-work-area.o
 obj-$(CONFIG_PPC_RTAS_DAEMON)	+= rtasd.o
 obj-$(CONFIG_RTAS_FLASH)	+= rtas_flash.o
 obj-$(CONFIG_RTAS_PROC)		+= rtas-proc.o
diff --git a/arch/powerpc/kernel/rtas-work-area.c b/arch/powerpc/kernel/rtas-work-area.c
new file mode 100644
index 000000000000..75950e13a0fe
--- /dev/null
+++ b/arch/powerpc/kernel/rtas-work-area.c
@@ -0,0 +1,208 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#define pr_fmt(fmt)	"rtas-work-area: " fmt
+
+#include <linux/genalloc.h>
+#include <linux/log2.h>
+#include <linux/kernel.h>
+#include <linux/memblock.h>
+#include <linux/mempool.h>
+#include <linux/minmax.h>
+#include <linux/mutex.h>
+#include <linux/numa.h>
+#include <linux/sizes.h>
+#include <linux/wait.h>
+
+#include <asm/machdep.h>
+#include <asm/rtas-work-area.h>
+
+enum {
+	/*
+	 * Ensure the pool is page-aligned.
+	 */
+	RTAS_WORK_AREA_ARENA_ALIGN = PAGE_SIZE,
+
+	RTAS_WORK_AREA_ARENA_SZ = SZ_256K,
+	/*
+	 * The smallest known work area size is for ibm,get-vpd's
+	 * location code argument, which is limited to 79 characters
+	 * plus 1 nul terminator.
+	 *
+	 * PAPR+ 7.3.20 ibm,get-vpd RTAS Call
+	 * PAPR+ 12.3.2.4 Converged Location Code Rules - Length Restrictions
+	 */
+	RTAS_WORK_AREA_MIN_ALLOC_SZ = roundup_pow_of_two(80),
+	/*
+	 * Don't let a single allocation claim the whole arena.
+	 */
+	RTAS_WORK_AREA_MAX_ALLOC_SZ = RTAS_WORK_AREA_ARENA_SZ / 2,
+};
+
+static struct rtas_work_area_allocator_state {
+	struct gen_pool *gen_pool;
+	char *arena;
+	struct mutex mutex; /* serializes allocations */
+	struct wait_queue_head wqh;
+	mempool_t descriptor_pool;
+	bool available;
+} rwa_state_ = {
+	.mutex = __MUTEX_INITIALIZER(rwa_state_.mutex),
+	.wqh = __WAIT_QUEUE_HEAD_INITIALIZER(rwa_state_.wqh),
+};
+static struct rtas_work_area_allocator_state *rwa_state = &rwa_state_;
+
+/*
+ * A single work area buffer and descriptor to serve requests early in
+ * boot before the allocator is fully initialized.
+ */
+static bool early_work_area_in_use __initdata;
+static char early_work_area_buf[SZ_4K] __initdata;
+static struct rtas_work_area early_work_area __initdata = {
+	.buf = early_work_area_buf,
+	.size = sizeof(early_work_area_buf),
+};
+
+
+static struct rtas_work_area * __init rtas_work_area_alloc_early(size_t size)
+{
+	WARN_ON(size > early_work_area.size);
+	WARN_ON(early_work_area_in_use);
+	early_work_area_in_use = true;
+	memset(early_work_area.buf, 0, early_work_area.size);
+	return &early_work_area;
+}
+
+static void __init rtas_work_area_free_early(struct rtas_work_area *work_area)
+{
+	WARN_ON(work_area != &early_work_area);
+	WARN_ON(!early_work_area_in_use);
+	early_work_area_in_use = false;
+}
+
+struct rtas_work_area * __ref rtas_work_area_alloc(size_t size)
+{
+	struct rtas_work_area *area;
+	unsigned long addr;
+
+	might_sleep();
+
+	WARN_ON(size > RTAS_WORK_AREA_MAX_ALLOC_SZ);
+	size = min_t(size_t, size, RTAS_WORK_AREA_MAX_ALLOC_SZ);
+
+	if (!rwa_state->available) {
+		area = rtas_work_area_alloc_early(size);
+		goto out;
+	}
+
+	/*
+	 * To ensure FCFS behavior and prevent a high rate of smaller
+	 * requests from starving larger ones, use the mutex to queue
+	 * allocations.
+	 */
+	mutex_lock(&rwa_state->mutex);
+	wait_event(rwa_state->wqh,
+		   (addr = gen_pool_alloc(rwa_state->gen_pool, size)) != 0);
+	mutex_unlock(&rwa_state->mutex);
+
+	area = mempool_alloc(&rwa_state->descriptor_pool, GFP_KERNEL);
+	*area = (typeof(*area)){
+		.size = size,
+		.buf = (char *)addr,
+	};
+out:
+	pr_devel("%ps -> %s() -> buf=%p size=%zu\n",
+		 (void *)_RET_IP_, __func__, area->buf, area->size);
+
+	return area;
+}
+
+void __ref rtas_work_area_free(struct rtas_work_area *area)
+{
+	pr_devel("%ps -> %s() -> buf=%p size=%zu\n",
+		 (void *)_RET_IP_, __func__, area->buf, area->size);
+
+	if (!rwa_state->available) {
+		rtas_work_area_free_early(area);
+		return;
+	}
+
+	gen_pool_free(rwa_state->gen_pool, (unsigned long)area->buf, area->size);
+	mempool_free(area, &rwa_state->descriptor_pool);
+	wake_up(&rwa_state->wqh);
+}
+
+/*
+ * Initialization of the work area allocator happens in two parts. To
+ * reliably reserve an arena that satisfies RTAS addressing
+ * requirements, we must perform a memblock allocation early,
+ * immmediately after RTAS instantiation. Then we have to wait until
+ * the slab allocator is up before setting up the descriptor mempool
+ * and adding the arena to a gen_pool.
+ */
+static __init int rtas_work_area_allocator_init(void)
+{
+	const unsigned int order = ilog2(RTAS_WORK_AREA_MIN_ALLOC_SZ);
+	const phys_addr_t pa_start = __pa(rwa_state->arena);
+	const phys_addr_t pa_end = pa_start + RTAS_WORK_AREA_ARENA_SZ - 1;
+	struct gen_pool *pool;
+	const int nid = NUMA_NO_NODE;
+	int err;
+
+	err = -ENOMEM;
+	if (!rwa_state->arena)
+		goto err_out;
+
+	pool = gen_pool_create(order, nid);
+	if (!pool)
+		goto err_out;
+	/*
+	 * All RTAS functions that consume work areas are OK with
+	 * natural alignment, when they have alignment requirements at
+	 * all.
+	 */
+	gen_pool_set_algo(pool, gen_pool_first_fit_order_align, NULL);
+
+	err = gen_pool_add(pool, (unsigned long)rwa_state->arena,
+			   RTAS_WORK_AREA_ARENA_SZ, nid);
+	if (err)
+		goto err_destroy;
+
+	err = mempool_init_kmalloc_pool(&rwa_state->descriptor_pool, 1,
+					sizeof(struct rtas_work_area));
+	if (err)
+		goto err_destroy;
+
+	rwa_state->gen_pool = pool;
+	rwa_state->available = true;
+
+	pr_debug("arena [%pa-%pa] (%uK), min/max alloc sizes %u/%u\n",
+		 &pa_start, &pa_end,
+		 RTAS_WORK_AREA_ARENA_SZ / SZ_1K,
+		 RTAS_WORK_AREA_MIN_ALLOC_SZ,
+		 RTAS_WORK_AREA_MAX_ALLOC_SZ);
+
+	return 0;
+
+err_destroy:
+	gen_pool_destroy(pool);
+err_out:
+	return err;
+}
+machine_arch_initcall(pseries, rtas_work_area_allocator_init);
+
+/**
+ * rtas_work_area_reserve_arena() - reserve memory suitable for RTAS work areas.
+ */
+int __init rtas_work_area_reserve_arena(const phys_addr_t limit)
+{
+	const phys_addr_t align = RTAS_WORK_AREA_ARENA_ALIGN;
+	const phys_addr_t size = RTAS_WORK_AREA_ARENA_SZ;
+	const phys_addr_t min = MEMBLOCK_LOW_LIMIT;
+	const int nid = NUMA_NO_NODE;
+
+	rwa_state->arena = memblock_alloc_try_nid(size, align, min, limit, nid);
+	if (!rwa_state->arena)
+		return -ENOMEM;
+
+	return 0;
+}
diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index 3290f25b9b34..41c430dc40c2 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -36,6 +36,7 @@
 #include <asm/machdep.h>
 #include <asm/mmu.h>
 #include <asm/page.h>
+#include <asm/rtas-work-area.h>
 #include <asm/rtas.h>
 #include <asm/time.h>
 #include <asm/trace.h>
@@ -1938,6 +1939,8 @@ void __init rtas_initialize(void)
 #endif
 	ibm_open_errinjct_token = rtas_token("ibm,open-errinjct");
 	ibm_errinjct_token = rtas_token("ibm,errinjct");
+
+	rtas_work_area_reserve_arena(rtas_region);
 }
 
 int __init early_init_dt_scan_rtas(unsigned long node,

-- 
2.39.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 12/19] powerpc/pseries/dlpar: use RTAS work area API
  2023-02-06 18:54 ` Nathan Lynch via B4 Submission Endpoint
@ 2023-02-06 18:54   ` Nathan Lynch via B4 Submission Endpoint
  -1 siblings, 0 replies; 50+ messages in thread
From: Nathan Lynch @ 2023-02-06 18:54 UTC (permalink / raw)
  To: Michael Ellerman, Nicholas Piggin, Christophe Leroy, Kajol Jain,
	Laurent Dufour, Mahesh J Salgaonkar, Andrew Donnellan,
	Nick Child
  Cc: linuxppc-dev, Nathan Lynch

Hold a work area object for the duration of the RTAS
ibm,configure-connector sequence, eliminating locking and copying
around each RTAS call.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
---
 arch/powerpc/platforms/pseries/dlpar.c | 27 +++++++++------------------
 1 file changed, 9 insertions(+), 18 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/dlpar.c b/arch/powerpc/platforms/pseries/dlpar.c
index 498d6efcb5ae..9b65b50a5456 100644
--- a/arch/powerpc/platforms/pseries/dlpar.c
+++ b/arch/powerpc/platforms/pseries/dlpar.c
@@ -22,6 +22,7 @@
 #include <asm/machdep.h>
 #include <linux/uaccess.h>
 #include <asm/rtas.h>
+#include <asm/rtas-work-area.h>
 
 static struct workqueue_struct *pseries_hp_wq;
 
@@ -137,6 +138,7 @@ struct device_node *dlpar_configure_connector(__be32 drc_index,
 	struct property *property;
 	struct property *last_property = NULL;
 	struct cc_workarea *ccwa;
+	struct rtas_work_area *work_area;
 	char *data_buf;
 	int cc_token;
 	int rc = -1;
@@ -145,29 +147,18 @@ struct device_node *dlpar_configure_connector(__be32 drc_index,
 	if (cc_token == RTAS_UNKNOWN_SERVICE)
 		return NULL;
 
-	data_buf = kzalloc(RTAS_DATA_BUF_SIZE, GFP_KERNEL);
-	if (!data_buf)
-		return NULL;
+	work_area = rtas_work_area_alloc(SZ_4K);
+	data_buf = rtas_work_area_raw_buf(work_area);
 
 	ccwa = (struct cc_workarea *)&data_buf[0];
 	ccwa->drc_index = drc_index;
 	ccwa->zero = 0;
 
 	do {
-		/* Since we release the rtas_data_buf lock between configure
-		 * connector calls we want to re-populate the rtas_data_buffer
-		 * with the contents of the previous call.
-		 */
-		spin_lock(&rtas_data_buf_lock);
-
-		memcpy(rtas_data_buf, data_buf, RTAS_DATA_BUF_SIZE);
-		rc = rtas_call(cc_token, 2, 1, NULL, rtas_data_buf, NULL);
-		memcpy(data_buf, rtas_data_buf, RTAS_DATA_BUF_SIZE);
-
-		spin_unlock(&rtas_data_buf_lock);
-
-		if (rtas_busy_delay(rc))
-			continue;
+		do {
+			rc = rtas_call(cc_token, 2, 1, NULL,
+				       rtas_work_area_phys(work_area), NULL);
+		} while (rtas_busy_delay(rc));
 
 		switch (rc) {
 		case COMPLETE:
@@ -227,7 +218,7 @@ struct device_node *dlpar_configure_connector(__be32 drc_index,
 	} while (rc);
 
 cc_error:
-	kfree(data_buf);
+	rtas_work_area_free(work_area);
 
 	if (rc) {
 		if (first_dn)

-- 
2.39.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 12/19] powerpc/pseries/dlpar: use RTAS work area API
@ 2023-02-06 18:54   ` Nathan Lynch via B4 Submission Endpoint
  0 siblings, 0 replies; 50+ messages in thread
From: Nathan Lynch via B4 Submission Endpoint @ 2023-02-06 18:54 UTC (permalink / raw)
  To: Michael Ellerman, Nicholas Piggin, Christophe Leroy, Kajol Jain,
	Laurent Dufour, Mahesh J Salgaonkar, Andrew Donnellan,
	Nick Child
  Cc: Nathan Lynch, linuxppc-dev

From: Nathan Lynch <nathanl@linux.ibm.com>

Hold a work area object for the duration of the RTAS
ibm,configure-connector sequence, eliminating locking and copying
around each RTAS call.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
---
 arch/powerpc/platforms/pseries/dlpar.c | 27 +++++++++------------------
 1 file changed, 9 insertions(+), 18 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/dlpar.c b/arch/powerpc/platforms/pseries/dlpar.c
index 498d6efcb5ae..9b65b50a5456 100644
--- a/arch/powerpc/platforms/pseries/dlpar.c
+++ b/arch/powerpc/platforms/pseries/dlpar.c
@@ -22,6 +22,7 @@
 #include <asm/machdep.h>
 #include <linux/uaccess.h>
 #include <asm/rtas.h>
+#include <asm/rtas-work-area.h>
 
 static struct workqueue_struct *pseries_hp_wq;
 
@@ -137,6 +138,7 @@ struct device_node *dlpar_configure_connector(__be32 drc_index,
 	struct property *property;
 	struct property *last_property = NULL;
 	struct cc_workarea *ccwa;
+	struct rtas_work_area *work_area;
 	char *data_buf;
 	int cc_token;
 	int rc = -1;
@@ -145,29 +147,18 @@ struct device_node *dlpar_configure_connector(__be32 drc_index,
 	if (cc_token == RTAS_UNKNOWN_SERVICE)
 		return NULL;
 
-	data_buf = kzalloc(RTAS_DATA_BUF_SIZE, GFP_KERNEL);
-	if (!data_buf)
-		return NULL;
+	work_area = rtas_work_area_alloc(SZ_4K);
+	data_buf = rtas_work_area_raw_buf(work_area);
 
 	ccwa = (struct cc_workarea *)&data_buf[0];
 	ccwa->drc_index = drc_index;
 	ccwa->zero = 0;
 
 	do {
-		/* Since we release the rtas_data_buf lock between configure
-		 * connector calls we want to re-populate the rtas_data_buffer
-		 * with the contents of the previous call.
-		 */
-		spin_lock(&rtas_data_buf_lock);
-
-		memcpy(rtas_data_buf, data_buf, RTAS_DATA_BUF_SIZE);
-		rc = rtas_call(cc_token, 2, 1, NULL, rtas_data_buf, NULL);
-		memcpy(data_buf, rtas_data_buf, RTAS_DATA_BUF_SIZE);
-
-		spin_unlock(&rtas_data_buf_lock);
-
-		if (rtas_busy_delay(rc))
-			continue;
+		do {
+			rc = rtas_call(cc_token, 2, 1, NULL,
+				       rtas_work_area_phys(work_area), NULL);
+		} while (rtas_busy_delay(rc));
 
 		switch (rc) {
 		case COMPLETE:
@@ -227,7 +218,7 @@ struct device_node *dlpar_configure_connector(__be32 drc_index,
 	} while (rc);
 
 cc_error:
-	kfree(data_buf);
+	rtas_work_area_free(work_area);
 
 	if (rc) {
 		if (first_dn)

-- 
2.39.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 13/19] powerpc/pseries: PAPR system parameter API
  2023-02-06 18:54 ` Nathan Lynch via B4 Submission Endpoint
@ 2023-02-06 18:54   ` Nathan Lynch via B4 Submission Endpoint
  -1 siblings, 0 replies; 50+ messages in thread
From: Nathan Lynch @ 2023-02-06 18:54 UTC (permalink / raw)
  To: Michael Ellerman, Nicholas Piggin, Christophe Leroy, Kajol Jain,
	Laurent Dufour, Mahesh J Salgaonkar, Andrew Donnellan,
	Nick Child
  Cc: linuxppc-dev, Nathan Lynch

Introduce a set of APIs for retrieving and updating PAPR system
parameters. This encapsulates the toil of temporary RTAS work area
management, RTAS function call retries, and translation of RTAS call
statuses to conventional error values.

There are several places in the kernel that already retrieve system
parameters by calling the RTAS ibm,get-system-parameter function
directly. These will be converted to papr_sysparm_get() in changes to
follow.

As for updating system parameters, current practice is to use
sys_rtas() from user space; there are no in-kernel users of the RTAS
ibm,set-system-parameter function. However this will become deprecated
in time because it is not compatible with lockdown.

The papr_sysparm_* APIs will form the common basis for in-kernel
and user space access to system parameters. The code to expose the
set/get capabilities to user space will follow.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
---
 arch/powerpc/include/asm/papr-sysparm.h       |  38 +++++++
 arch/powerpc/platforms/pseries/Makefile       |   2 +-
 arch/powerpc/platforms/pseries/papr-sysparm.c | 151 ++++++++++++++++++++++++++
 3 files changed, 190 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/papr-sysparm.h b/arch/powerpc/include/asm/papr-sysparm.h
new file mode 100644
index 000000000000..7645a71e5369
--- /dev/null
+++ b/arch/powerpc/include/asm/papr-sysparm.h
@@ -0,0 +1,38 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef POWERPC_PAPR_SYSPARM_H
+#define POWERPC_PAPR_SYSPARM_H
+
+typedef struct {
+	const u32 token;
+} papr_sysparm_t;
+
+#define mk_papr_sysparm(x_) ((papr_sysparm_t){ .token = x_, })
+
+/*
+ * Derived from the "Defined Parameters" table in PAPR 7.3.16 System
+ * Parameters Option. Where the spec says "characteristics", we use
+ * "attrs" in the symbolic names to keep them from getting too
+ * unwieldy.
+ */
+#define PAPR_SYSPARM_SHARED_PROC_LPAR_ATTRS        mk_papr_sysparm(20)
+#define PAPR_SYSPARM_PROC_MODULE_INFO              mk_papr_sysparm(43)
+#define PAPR_SYSPARM_COOP_MEM_OVERCOMMIT_ATTRS     mk_papr_sysparm(44)
+#define PAPR_SYSPARM_TLB_BLOCK_INVALIDATE_ATTRS    mk_papr_sysparm(50)
+#define PAPR_SYSPARM_LPAR_NAME                     mk_papr_sysparm(55)
+
+enum {
+	PAPR_SYSPARM_MAX_INPUT  = 1024,
+	PAPR_SYSPARM_MAX_OUTPUT = 4000,
+};
+
+struct papr_sysparm_buf {
+	__be16 len;
+	char val[PAPR_SYSPARM_MAX_OUTPUT];
+};
+
+struct papr_sysparm_buf *papr_sysparm_buf_alloc(void);
+void papr_sysparm_buf_free(struct papr_sysparm_buf *buf);
+int papr_sysparm_set(papr_sysparm_t param, const struct papr_sysparm_buf *buf);
+int papr_sysparm_get(papr_sysparm_t param, struct papr_sysparm_buf *buf);
+
+#endif /* POWERPC_PAPR_SYSPARM_H */
diff --git a/arch/powerpc/platforms/pseries/Makefile b/arch/powerpc/platforms/pseries/Makefile
index 92310202bdd7..a9935f864a16 100644
--- a/arch/powerpc/platforms/pseries/Makefile
+++ b/arch/powerpc/platforms/pseries/Makefile
@@ -3,7 +3,7 @@ ccflags-$(CONFIG_PPC64)			:= $(NO_MINIMAL_TOC)
 ccflags-$(CONFIG_PPC_PSERIES_DEBUG)	+= -DDEBUG
 
 obj-y			:= lpar.o hvCall.o nvram.o reconfig.o \
-			   of_helpers.o \
+			   of_helpers.o papr-sysparm.o \
 			   setup.o iommu.o event_sources.o ras.o \
 			   firmware.o power.o dlpar.o mobility.o rng.o \
 			   pci.o pci_dlpar.o eeh_pseries.o msi.o \
diff --git a/arch/powerpc/platforms/pseries/papr-sysparm.c b/arch/powerpc/platforms/pseries/papr-sysparm.c
new file mode 100644
index 000000000000..2bb5c816399b
--- /dev/null
+++ b/arch/powerpc/platforms/pseries/papr-sysparm.c
@@ -0,0 +1,151 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#define pr_fmt(fmt)	"papr-sysparm: " fmt
+
+#include <linux/bug.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/printk.h>
+#include <linux/slab.h>
+#include <asm/rtas.h>
+#include <asm/papr-sysparm.h>
+#include <asm/rtas-work-area.h>
+
+struct papr_sysparm_buf *papr_sysparm_buf_alloc(void)
+{
+	struct papr_sysparm_buf *buf = kzalloc(sizeof(*buf), GFP_KERNEL);
+
+	return buf;
+}
+
+void papr_sysparm_buf_free(struct papr_sysparm_buf *buf)
+{
+	kfree(buf);
+}
+
+/**
+ * papr_sysparm_get() - Retrieve the value of a PAPR system parameter.
+ * @param: PAPR system parameter token as described in
+ *         7.3.16 "System Parameters Option".
+ * @buf: A &struct papr_sysparm_buf as returned from papr_sysparm_buf_alloc().
+ *
+ * Place the result of querying the specified parameter, if available,
+ * in @buf. The result includes a be16 length header followed by the
+ * value, which may be a string or binary data. See &struct papr_sysparm_buf.
+ *
+ * Since there is at least one parameter (60, OS Service Entitlement
+ * Status) where the results depend on the incoming contents of the
+ * work area, the caller-supplied buffer is copied unmodified into the
+ * work area before calling ibm,get-system-parameter.
+ *
+ * A defined parameter may not be implemented on a given system, and
+ * some implemented parameters may not be available to all partitions
+ * on a system. A parameter's disposition may change at any time due
+ * to system configuration changes or partition migration.
+ *
+ * Context: This function may sleep.
+ *
+ * Return: 0 on success, -errno otherwise. @buf is unmodified on error.
+ */
+
+int papr_sysparm_get(papr_sysparm_t param, struct papr_sysparm_buf *buf)
+{
+	const s32 token = rtas_token("ibm,get-system-parameter");
+	struct rtas_work_area *work_area;
+	s32 fwrc;
+	int ret;
+
+	might_sleep();
+
+	if (WARN_ON(!buf))
+		return -EFAULT;
+
+	if (token == RTAS_UNKNOWN_SERVICE)
+		return -ENOENT;
+
+	work_area = rtas_work_area_alloc(sizeof(*buf));
+
+	memcpy(rtas_work_area_raw_buf(work_area), buf, sizeof(*buf));
+
+	do {
+		fwrc = rtas_call(token, 3, 1, NULL, param.token,
+				 rtas_work_area_phys(work_area),
+				 rtas_work_area_size(work_area));
+	} while (rtas_busy_delay(fwrc));
+
+	switch (fwrc) {
+	case 0:
+		ret = 0;
+		memcpy(buf, rtas_work_area_raw_buf(work_area), sizeof(*buf));
+		break;
+	case -3: /* parameter not implemented */
+		ret = -EOPNOTSUPP;
+		break;
+	case -9002: /* this partition not authorized to retrieve this parameter */
+		ret = -EPERM;
+		break;
+	case -9999: /* "parameter error" e.g. the buffer is too small */
+		ret = -EINVAL;
+		break;
+	default:
+		pr_err("unexpected ibm,get-system-parameter result %d\n", fwrc);
+		fallthrough;
+	case -1: /* Hardware/platform error */
+		ret = -EIO;
+		break;
+	}
+
+	rtas_work_area_free(work_area);
+
+	return ret;
+}
+
+int papr_sysparm_set(papr_sysparm_t param, const struct papr_sysparm_buf *buf)
+{
+	const s32 token = rtas_token("ibm,set-system-parameter");
+	struct rtas_work_area *work_area;
+	s32 fwrc;
+	int ret;
+
+	might_sleep();
+
+	if (WARN_ON(!buf))
+		return -EFAULT;
+
+	if (token == RTAS_UNKNOWN_SERVICE)
+		return -ENOENT;
+
+	work_area = rtas_work_area_alloc(sizeof(*buf));
+
+	memcpy(rtas_work_area_raw_buf(work_area), buf, sizeof(*buf));
+
+	do {
+		fwrc = rtas_call(token, 2, 1, NULL, param.token,
+				 rtas_work_area_phys(work_area));
+	} while (rtas_busy_delay(fwrc));
+
+	switch (fwrc) {
+	case 0:
+		ret = 0;
+		break;
+	case -3: /* parameter not supported */
+		ret = -EOPNOTSUPP;
+		break;
+	case -9002: /* this partition not authorized to modify this parameter */
+		ret = -EPERM;
+		break;
+	case -9999: /* "parameter error" e.g. invalid input data */
+		ret = -EINVAL;
+		break;
+	default:
+		pr_err("unexpected ibm,set-system-parameter result %d\n", fwrc);
+		fallthrough;
+	case -1: /* Hardware/platform error */
+		ret = -EIO;
+		break;
+	}
+
+	rtas_work_area_free(work_area);
+
+	return ret;
+}

-- 
2.39.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 13/19] powerpc/pseries: PAPR system parameter API
@ 2023-02-06 18:54   ` Nathan Lynch via B4 Submission Endpoint
  0 siblings, 0 replies; 50+ messages in thread
From: Nathan Lynch via B4 Submission Endpoint @ 2023-02-06 18:54 UTC (permalink / raw)
  To: Michael Ellerman, Nicholas Piggin, Christophe Leroy, Kajol Jain,
	Laurent Dufour, Mahesh J Salgaonkar, Andrew Donnellan,
	Nick Child
  Cc: Nathan Lynch, linuxppc-dev

From: Nathan Lynch <nathanl@linux.ibm.com>

Introduce a set of APIs for retrieving and updating PAPR system
parameters. This encapsulates the toil of temporary RTAS work area
management, RTAS function call retries, and translation of RTAS call
statuses to conventional error values.

There are several places in the kernel that already retrieve system
parameters by calling the RTAS ibm,get-system-parameter function
directly. These will be converted to papr_sysparm_get() in changes to
follow.

As for updating system parameters, current practice is to use
sys_rtas() from user space; there are no in-kernel users of the RTAS
ibm,set-system-parameter function. However this will become deprecated
in time because it is not compatible with lockdown.

The papr_sysparm_* APIs will form the common basis for in-kernel
and user space access to system parameters. The code to expose the
set/get capabilities to user space will follow.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
---
 arch/powerpc/include/asm/papr-sysparm.h       |  38 +++++++
 arch/powerpc/platforms/pseries/Makefile       |   2 +-
 arch/powerpc/platforms/pseries/papr-sysparm.c | 151 ++++++++++++++++++++++++++
 3 files changed, 190 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/papr-sysparm.h b/arch/powerpc/include/asm/papr-sysparm.h
new file mode 100644
index 000000000000..7645a71e5369
--- /dev/null
+++ b/arch/powerpc/include/asm/papr-sysparm.h
@@ -0,0 +1,38 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef POWERPC_PAPR_SYSPARM_H
+#define POWERPC_PAPR_SYSPARM_H
+
+typedef struct {
+	const u32 token;
+} papr_sysparm_t;
+
+#define mk_papr_sysparm(x_) ((papr_sysparm_t){ .token = x_, })
+
+/*
+ * Derived from the "Defined Parameters" table in PAPR 7.3.16 System
+ * Parameters Option. Where the spec says "characteristics", we use
+ * "attrs" in the symbolic names to keep them from getting too
+ * unwieldy.
+ */
+#define PAPR_SYSPARM_SHARED_PROC_LPAR_ATTRS        mk_papr_sysparm(20)
+#define PAPR_SYSPARM_PROC_MODULE_INFO              mk_papr_sysparm(43)
+#define PAPR_SYSPARM_COOP_MEM_OVERCOMMIT_ATTRS     mk_papr_sysparm(44)
+#define PAPR_SYSPARM_TLB_BLOCK_INVALIDATE_ATTRS    mk_papr_sysparm(50)
+#define PAPR_SYSPARM_LPAR_NAME                     mk_papr_sysparm(55)
+
+enum {
+	PAPR_SYSPARM_MAX_INPUT  = 1024,
+	PAPR_SYSPARM_MAX_OUTPUT = 4000,
+};
+
+struct papr_sysparm_buf {
+	__be16 len;
+	char val[PAPR_SYSPARM_MAX_OUTPUT];
+};
+
+struct papr_sysparm_buf *papr_sysparm_buf_alloc(void);
+void papr_sysparm_buf_free(struct papr_sysparm_buf *buf);
+int papr_sysparm_set(papr_sysparm_t param, const struct papr_sysparm_buf *buf);
+int papr_sysparm_get(papr_sysparm_t param, struct papr_sysparm_buf *buf);
+
+#endif /* POWERPC_PAPR_SYSPARM_H */
diff --git a/arch/powerpc/platforms/pseries/Makefile b/arch/powerpc/platforms/pseries/Makefile
index 92310202bdd7..a9935f864a16 100644
--- a/arch/powerpc/platforms/pseries/Makefile
+++ b/arch/powerpc/platforms/pseries/Makefile
@@ -3,7 +3,7 @@ ccflags-$(CONFIG_PPC64)			:= $(NO_MINIMAL_TOC)
 ccflags-$(CONFIG_PPC_PSERIES_DEBUG)	+= -DDEBUG
 
 obj-y			:= lpar.o hvCall.o nvram.o reconfig.o \
-			   of_helpers.o \
+			   of_helpers.o papr-sysparm.o \
 			   setup.o iommu.o event_sources.o ras.o \
 			   firmware.o power.o dlpar.o mobility.o rng.o \
 			   pci.o pci_dlpar.o eeh_pseries.o msi.o \
diff --git a/arch/powerpc/platforms/pseries/papr-sysparm.c b/arch/powerpc/platforms/pseries/papr-sysparm.c
new file mode 100644
index 000000000000..2bb5c816399b
--- /dev/null
+++ b/arch/powerpc/platforms/pseries/papr-sysparm.c
@@ -0,0 +1,151 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#define pr_fmt(fmt)	"papr-sysparm: " fmt
+
+#include <linux/bug.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/printk.h>
+#include <linux/slab.h>
+#include <asm/rtas.h>
+#include <asm/papr-sysparm.h>
+#include <asm/rtas-work-area.h>
+
+struct papr_sysparm_buf *papr_sysparm_buf_alloc(void)
+{
+	struct papr_sysparm_buf *buf = kzalloc(sizeof(*buf), GFP_KERNEL);
+
+	return buf;
+}
+
+void papr_sysparm_buf_free(struct papr_sysparm_buf *buf)
+{
+	kfree(buf);
+}
+
+/**
+ * papr_sysparm_get() - Retrieve the value of a PAPR system parameter.
+ * @param: PAPR system parameter token as described in
+ *         7.3.16 "System Parameters Option".
+ * @buf: A &struct papr_sysparm_buf as returned from papr_sysparm_buf_alloc().
+ *
+ * Place the result of querying the specified parameter, if available,
+ * in @buf. The result includes a be16 length header followed by the
+ * value, which may be a string or binary data. See &struct papr_sysparm_buf.
+ *
+ * Since there is at least one parameter (60, OS Service Entitlement
+ * Status) where the results depend on the incoming contents of the
+ * work area, the caller-supplied buffer is copied unmodified into the
+ * work area before calling ibm,get-system-parameter.
+ *
+ * A defined parameter may not be implemented on a given system, and
+ * some implemented parameters may not be available to all partitions
+ * on a system. A parameter's disposition may change at any time due
+ * to system configuration changes or partition migration.
+ *
+ * Context: This function may sleep.
+ *
+ * Return: 0 on success, -errno otherwise. @buf is unmodified on error.
+ */
+
+int papr_sysparm_get(papr_sysparm_t param, struct papr_sysparm_buf *buf)
+{
+	const s32 token = rtas_token("ibm,get-system-parameter");
+	struct rtas_work_area *work_area;
+	s32 fwrc;
+	int ret;
+
+	might_sleep();
+
+	if (WARN_ON(!buf))
+		return -EFAULT;
+
+	if (token == RTAS_UNKNOWN_SERVICE)
+		return -ENOENT;
+
+	work_area = rtas_work_area_alloc(sizeof(*buf));
+
+	memcpy(rtas_work_area_raw_buf(work_area), buf, sizeof(*buf));
+
+	do {
+		fwrc = rtas_call(token, 3, 1, NULL, param.token,
+				 rtas_work_area_phys(work_area),
+				 rtas_work_area_size(work_area));
+	} while (rtas_busy_delay(fwrc));
+
+	switch (fwrc) {
+	case 0:
+		ret = 0;
+		memcpy(buf, rtas_work_area_raw_buf(work_area), sizeof(*buf));
+		break;
+	case -3: /* parameter not implemented */
+		ret = -EOPNOTSUPP;
+		break;
+	case -9002: /* this partition not authorized to retrieve this parameter */
+		ret = -EPERM;
+		break;
+	case -9999: /* "parameter error" e.g. the buffer is too small */
+		ret = -EINVAL;
+		break;
+	default:
+		pr_err("unexpected ibm,get-system-parameter result %d\n", fwrc);
+		fallthrough;
+	case -1: /* Hardware/platform error */
+		ret = -EIO;
+		break;
+	}
+
+	rtas_work_area_free(work_area);
+
+	return ret;
+}
+
+int papr_sysparm_set(papr_sysparm_t param, const struct papr_sysparm_buf *buf)
+{
+	const s32 token = rtas_token("ibm,set-system-parameter");
+	struct rtas_work_area *work_area;
+	s32 fwrc;
+	int ret;
+
+	might_sleep();
+
+	if (WARN_ON(!buf))
+		return -EFAULT;
+
+	if (token == RTAS_UNKNOWN_SERVICE)
+		return -ENOENT;
+
+	work_area = rtas_work_area_alloc(sizeof(*buf));
+
+	memcpy(rtas_work_area_raw_buf(work_area), buf, sizeof(*buf));
+
+	do {
+		fwrc = rtas_call(token, 2, 1, NULL, param.token,
+				 rtas_work_area_phys(work_area));
+	} while (rtas_busy_delay(fwrc));
+
+	switch (fwrc) {
+	case 0:
+		ret = 0;
+		break;
+	case -3: /* parameter not supported */
+		ret = -EOPNOTSUPP;
+		break;
+	case -9002: /* this partition not authorized to modify this parameter */
+		ret = -EPERM;
+		break;
+	case -9999: /* "parameter error" e.g. invalid input data */
+		ret = -EINVAL;
+		break;
+	default:
+		pr_err("unexpected ibm,set-system-parameter result %d\n", fwrc);
+		fallthrough;
+	case -1: /* Hardware/platform error */
+		ret = -EIO;
+		break;
+	}
+
+	rtas_work_area_free(work_area);
+
+	return ret;
+}

-- 
2.39.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 14/19] powerpc/pseries: convert CMO probe to papr_sysparm API
  2023-02-06 18:54 ` Nathan Lynch via B4 Submission Endpoint
@ 2023-02-06 18:54   ` Nathan Lynch via B4 Submission Endpoint
  -1 siblings, 0 replies; 50+ messages in thread
From: Nathan Lynch @ 2023-02-06 18:54 UTC (permalink / raw)
  To: Michael Ellerman, Nicholas Piggin, Christophe Leroy, Kajol Jain,
	Laurent Dufour, Mahesh J Salgaonkar, Andrew Donnellan,
	Nick Child
  Cc: linuxppc-dev, Nathan Lynch

Convert the direct invocation of the ibm,get-system-parameter RTAS
function to papr_sysparm_get().

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
---
 arch/powerpc/platforms/pseries/setup.c | 23 ++++++-----------------
 1 file changed, 6 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c
index 74e50b6b28d4..420a2fa48292 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -57,6 +57,7 @@
 #include <asm/pmc.h>
 #include <asm/xics.h>
 #include <asm/xive.h>
+#include <asm/papr-sysparm.h>
 #include <asm/ppc-pci.h>
 #include <asm/i8259.h>
 #include <asm/udbg.h>
@@ -941,32 +942,21 @@ void pSeries_coalesce_init(void)
  */
 static void __init pSeries_cmo_feature_init(void)
 {
-	const s32 token = rtas_token("ibm,get-system-parameter");
+	static struct papr_sysparm_buf buf __initdata;
+	static_assert(sizeof(buf.val) >= CMO_MAXLENGTH);
 	char *ptr, *key, *value, *end;
-	int call_status;
 	int page_order = IOMMU_PAGE_SHIFT_4K;
 
 	pr_debug(" -> fw_cmo_feature_init()\n");
 
-	do {
-		spin_lock(&rtas_data_buf_lock);
-		call_status = rtas_call(token, 3, 1, NULL,
-					CMO_CHARACTERISTICS_TOKEN,
-					__pa(rtas_data_buf),
-					RTAS_DATA_BUF_SIZE);
-		if (call_status == 0)
-			break;
-		spin_unlock(&rtas_data_buf_lock);
-	} while (rtas_busy_delay(call_status));
-
-	if (call_status != 0) {
+	if (papr_sysparm_get(PAPR_SYSPARM_COOP_MEM_OVERCOMMIT_ATTRS, &buf)) {
 		pr_debug("CMO not available\n");
 		pr_debug(" <- fw_cmo_feature_init()\n");
 		return;
 	}
 
-	end = rtas_data_buf + CMO_MAXLENGTH - 2;
-	ptr = rtas_data_buf + 2;	/* step over strlen value */
+	end = &buf.val[CMO_MAXLENGTH];
+	ptr = &buf.val[0];
 	key = value = ptr;
 
 	while (*ptr && (ptr <= end)) {
@@ -1012,7 +1002,6 @@ static void __init pSeries_cmo_feature_init(void)
 	} else
 		pr_debug("CMO not enabled, PrPSP=%d, SecPSP=%d\n", CMO_PrPSP,
 		         CMO_SecPSP);
-	spin_unlock(&rtas_data_buf_lock);
 	pr_debug(" <- fw_cmo_feature_init()\n");
 }
 

-- 
2.39.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 14/19] powerpc/pseries: convert CMO probe to papr_sysparm API
@ 2023-02-06 18:54   ` Nathan Lynch via B4 Submission Endpoint
  0 siblings, 0 replies; 50+ messages in thread
From: Nathan Lynch via B4 Submission Endpoint @ 2023-02-06 18:54 UTC (permalink / raw)
  To: Michael Ellerman, Nicholas Piggin, Christophe Leroy, Kajol Jain,
	Laurent Dufour, Mahesh J Salgaonkar, Andrew Donnellan,
	Nick Child
  Cc: Nathan Lynch, linuxppc-dev

From: Nathan Lynch <nathanl@linux.ibm.com>

Convert the direct invocation of the ibm,get-system-parameter RTAS
function to papr_sysparm_get().

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
---
 arch/powerpc/platforms/pseries/setup.c | 23 ++++++-----------------
 1 file changed, 6 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c
index 74e50b6b28d4..420a2fa48292 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -57,6 +57,7 @@
 #include <asm/pmc.h>
 #include <asm/xics.h>
 #include <asm/xive.h>
+#include <asm/papr-sysparm.h>
 #include <asm/ppc-pci.h>
 #include <asm/i8259.h>
 #include <asm/udbg.h>
@@ -941,32 +942,21 @@ void pSeries_coalesce_init(void)
  */
 static void __init pSeries_cmo_feature_init(void)
 {
-	const s32 token = rtas_token("ibm,get-system-parameter");
+	static struct papr_sysparm_buf buf __initdata;
+	static_assert(sizeof(buf.val) >= CMO_MAXLENGTH);
 	char *ptr, *key, *value, *end;
-	int call_status;
 	int page_order = IOMMU_PAGE_SHIFT_4K;
 
 	pr_debug(" -> fw_cmo_feature_init()\n");
 
-	do {
-		spin_lock(&rtas_data_buf_lock);
-		call_status = rtas_call(token, 3, 1, NULL,
-					CMO_CHARACTERISTICS_TOKEN,
-					__pa(rtas_data_buf),
-					RTAS_DATA_BUF_SIZE);
-		if (call_status == 0)
-			break;
-		spin_unlock(&rtas_data_buf_lock);
-	} while (rtas_busy_delay(call_status));
-
-	if (call_status != 0) {
+	if (papr_sysparm_get(PAPR_SYSPARM_COOP_MEM_OVERCOMMIT_ATTRS, &buf)) {
 		pr_debug("CMO not available\n");
 		pr_debug(" <- fw_cmo_feature_init()\n");
 		return;
 	}
 
-	end = rtas_data_buf + CMO_MAXLENGTH - 2;
-	ptr = rtas_data_buf + 2;	/* step over strlen value */
+	end = &buf.val[CMO_MAXLENGTH];
+	ptr = &buf.val[0];
 	key = value = ptr;
 
 	while (*ptr && (ptr <= end)) {
@@ -1012,7 +1002,6 @@ static void __init pSeries_cmo_feature_init(void)
 	} else
 		pr_debug("CMO not enabled, PrPSP=%d, SecPSP=%d\n", CMO_PrPSP,
 		         CMO_SecPSP);
-	spin_unlock(&rtas_data_buf_lock);
 	pr_debug(" <- fw_cmo_feature_init()\n");
 }
 

-- 
2.39.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 15/19] powerpc/pseries/lparcfg: convert to papr_sysparm API
  2023-02-06 18:54 ` Nathan Lynch via B4 Submission Endpoint
@ 2023-02-06 18:54   ` Nathan Lynch via B4 Submission Endpoint
  -1 siblings, 0 replies; 50+ messages in thread
From: Nathan Lynch @ 2023-02-06 18:54 UTC (permalink / raw)
  To: Michael Ellerman, Nicholas Piggin, Christophe Leroy, Kajol Jain,
	Laurent Dufour, Mahesh J Salgaonkar, Andrew Donnellan,
	Nick Child
  Cc: linuxppc-dev, Nathan Lynch

/proc/powerpc/lparcfg derives the LPAR name and SPLPAR characteristics
it reports using bare calls to the RTAS ibm,get-system-parameter
function. Convert these to the higher-level papr_sysparm API, which
handles the tedious details.

While the SPLPAR string parsing code could stand to be updated, that
should be done in a separate change. It is minimally modified here to
reduce the risk of changing behavior.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
---
 arch/powerpc/platforms/pseries/lparcfg.c | 104 +++++++------------------------
 1 file changed, 24 insertions(+), 80 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/lparcfg.c b/arch/powerpc/platforms/pseries/lparcfg.c
index cd33d5800763..8acc70509520 100644
--- a/arch/powerpc/platforms/pseries/lparcfg.c
+++ b/arch/powerpc/platforms/pseries/lparcfg.c
@@ -19,6 +19,7 @@
 #include <linux/errno.h>
 #include <linux/proc_fs.h>
 #include <linux/init.h>
+#include <asm/papr-sysparm.h>
 #include <linux/seq_file.h>
 #include <linux/slab.h>
 #include <linux/uaccess.h>
@@ -311,16 +312,6 @@ static void parse_mpp_x_data(struct seq_file *m)
 		seq_printf(m, "coalesce_pool_spurr=%ld\n", mpp_x_data.pool_spurr_cycles);
 }
 
-/*
- * PAPR defines, in section "7.3.16 System Parameters Option", the token 55 to
- * read the LPAR name, and the largest output data to 4000 + 2 bytes length.
- */
-#define SPLPAR_LPAR_NAME_TOKEN	55
-#define GET_SYS_PARM_BUF_SIZE	4002
-#if GET_SYS_PARM_BUF_SIZE > RTAS_DATA_BUF_SIZE
-#error "GET_SYS_PARM_BUF_SIZE is larger than RTAS_DATA_BUF_SIZE"
-#endif
-
 /*
  * Read the lpar name using the RTAS ibm,get-system-parameter call.
  *
@@ -332,46 +323,19 @@ static void parse_mpp_x_data(struct seq_file *m)
  */
 static int read_rtas_lpar_name(struct seq_file *m)
 {
-	int rc, len, token;
-	union {
-		char raw_buffer[GET_SYS_PARM_BUF_SIZE];
-		struct {
-			__be16 len;
-			char name[GET_SYS_PARM_BUF_SIZE-2];
-		};
-	} *local_buffer;
+	struct papr_sysparm_buf *buf;
+	int err;
 
-	token = rtas_token("ibm,get-system-parameter");
-	if (token == RTAS_UNKNOWN_SERVICE)
-		return -EINVAL;
-
-	local_buffer = kmalloc(sizeof(*local_buffer), GFP_KERNEL);
-	if (!local_buffer)
+	buf = papr_sysparm_buf_alloc();
+	if (!buf)
 		return -ENOMEM;
 
-	do {
-		spin_lock(&rtas_data_buf_lock);
-		memset(rtas_data_buf, 0, sizeof(*local_buffer));
-		rc = rtas_call(token, 3, 1, NULL, SPLPAR_LPAR_NAME_TOKEN,
-			       __pa(rtas_data_buf), sizeof(*local_buffer));
-		if (!rc)
-			memcpy(local_buffer->raw_buffer, rtas_data_buf,
-			       sizeof(local_buffer->raw_buffer));
-		spin_unlock(&rtas_data_buf_lock);
-	} while (rtas_busy_delay(rc));
+	err = papr_sysparm_get(PAPR_SYSPARM_LPAR_NAME, buf);
+	if (!err)
+		seq_printf(m, "partition_name=%s\n", buf->val);
 
-	if (!rc) {
-		/* Force end of string */
-		len = min((int) be16_to_cpu(local_buffer->len),
-			  (int) sizeof(local_buffer->name)-1);
-		local_buffer->name[len] = '\0';
-
-		seq_printf(m, "partition_name=%s\n", local_buffer->name);
-	} else
-		rc = -ENODATA;
-
-	kfree(local_buffer);
-	return rc;
+	papr_sysparm_buf_free(buf);
+	return err;
 }
 
 /*
@@ -397,7 +361,6 @@ static void read_lpar_name(struct seq_file *m)
 		pr_err_once("Error can't get the LPAR name");
 }
 
-#define SPLPAR_CHARACTERISTICS_TOKEN 20
 #define SPLPAR_MAXLENGTH 1026*(sizeof(char))
 
 /*
@@ -408,45 +371,25 @@ static void read_lpar_name(struct seq_file *m)
  */
 static void parse_system_parameter_string(struct seq_file *m)
 {
-	const s32 token = rtas_token("ibm,get-system-parameter");
-	int call_status;
+	struct papr_sysparm_buf *buf;
 
-	unsigned char *local_buffer = kmalloc(SPLPAR_MAXLENGTH, GFP_KERNEL);
-	if (!local_buffer) {
-		printk(KERN_ERR "%s %s kmalloc failure at line %d\n",
-		       __FILE__, __func__, __LINE__);
+	buf = papr_sysparm_buf_alloc();
+	if (!buf)
 		return;
-	}
 
-	do {
-		spin_lock(&rtas_data_buf_lock);
-		memset(rtas_data_buf, 0, SPLPAR_MAXLENGTH);
-		call_status = rtas_call(token, 3, 1, NULL, SPLPAR_CHARACTERISTICS_TOKEN,
-					__pa(rtas_data_buf), RTAS_DATA_BUF_SIZE);
-		memcpy(local_buffer, rtas_data_buf, SPLPAR_MAXLENGTH);
-		local_buffer[SPLPAR_MAXLENGTH - 1] = '\0';
-		spin_unlock(&rtas_data_buf_lock);
-	} while (rtas_busy_delay(call_status));
-
-	if (call_status != 0) {
-		printk(KERN_INFO
-		       "%s %s Error calling get-system-parameter (0x%x)\n",
-		       __FILE__, __func__, call_status);
+	if (papr_sysparm_get(PAPR_SYSPARM_SHARED_PROC_LPAR_ATTRS, buf)) {
+		goto out_free;
 	} else {
+		const char *local_buffer;
 		int splpar_strlen;
 		int idx, w_idx;
 		char *workbuffer = kzalloc(SPLPAR_MAXLENGTH, GFP_KERNEL);
-		if (!workbuffer) {
-			printk(KERN_ERR "%s %s kmalloc failure at line %d\n",
-			       __FILE__, __func__, __LINE__);
-			kfree(local_buffer);
-			return;
-		}
-#ifdef LPARCFG_DEBUG
-		printk(KERN_INFO "success calling get-system-parameter\n");
-#endif
-		splpar_strlen = local_buffer[0] * 256 + local_buffer[1];
-		local_buffer += 2;	/* step over strlen value */
+
+		if (!workbuffer)
+			goto out_free;
+
+		splpar_strlen = be16_to_cpu(buf->len);
+		local_buffer = buf->val;
 
 		w_idx = 0;
 		idx = 0;
@@ -480,7 +423,8 @@ static void parse_system_parameter_string(struct seq_file *m)
 		kfree(workbuffer);
 		local_buffer -= 2;	/* back up over strlen value */
 	}
-	kfree(local_buffer);
+out_free:
+	papr_sysparm_buf_free(buf);
 }
 
 /* Return the number of processors in the system.

-- 
2.39.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 15/19] powerpc/pseries/lparcfg: convert to papr_sysparm API
@ 2023-02-06 18:54   ` Nathan Lynch via B4 Submission Endpoint
  0 siblings, 0 replies; 50+ messages in thread
From: Nathan Lynch via B4 Submission Endpoint @ 2023-02-06 18:54 UTC (permalink / raw)
  To: Michael Ellerman, Nicholas Piggin, Christophe Leroy, Kajol Jain,
	Laurent Dufour, Mahesh J Salgaonkar, Andrew Donnellan,
	Nick Child
  Cc: Nathan Lynch, linuxppc-dev

From: Nathan Lynch <nathanl@linux.ibm.com>

/proc/powerpc/lparcfg derives the LPAR name and SPLPAR characteristics
it reports using bare calls to the RTAS ibm,get-system-parameter
function. Convert these to the higher-level papr_sysparm API, which
handles the tedious details.

While the SPLPAR string parsing code could stand to be updated, that
should be done in a separate change. It is minimally modified here to
reduce the risk of changing behavior.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
---
 arch/powerpc/platforms/pseries/lparcfg.c | 104 +++++++------------------------
 1 file changed, 24 insertions(+), 80 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/lparcfg.c b/arch/powerpc/platforms/pseries/lparcfg.c
index cd33d5800763..8acc70509520 100644
--- a/arch/powerpc/platforms/pseries/lparcfg.c
+++ b/arch/powerpc/platforms/pseries/lparcfg.c
@@ -19,6 +19,7 @@
 #include <linux/errno.h>
 #include <linux/proc_fs.h>
 #include <linux/init.h>
+#include <asm/papr-sysparm.h>
 #include <linux/seq_file.h>
 #include <linux/slab.h>
 #include <linux/uaccess.h>
@@ -311,16 +312,6 @@ static void parse_mpp_x_data(struct seq_file *m)
 		seq_printf(m, "coalesce_pool_spurr=%ld\n", mpp_x_data.pool_spurr_cycles);
 }
 
-/*
- * PAPR defines, in section "7.3.16 System Parameters Option", the token 55 to
- * read the LPAR name, and the largest output data to 4000 + 2 bytes length.
- */
-#define SPLPAR_LPAR_NAME_TOKEN	55
-#define GET_SYS_PARM_BUF_SIZE	4002
-#if GET_SYS_PARM_BUF_SIZE > RTAS_DATA_BUF_SIZE
-#error "GET_SYS_PARM_BUF_SIZE is larger than RTAS_DATA_BUF_SIZE"
-#endif
-
 /*
  * Read the lpar name using the RTAS ibm,get-system-parameter call.
  *
@@ -332,46 +323,19 @@ static void parse_mpp_x_data(struct seq_file *m)
  */
 static int read_rtas_lpar_name(struct seq_file *m)
 {
-	int rc, len, token;
-	union {
-		char raw_buffer[GET_SYS_PARM_BUF_SIZE];
-		struct {
-			__be16 len;
-			char name[GET_SYS_PARM_BUF_SIZE-2];
-		};
-	} *local_buffer;
+	struct papr_sysparm_buf *buf;
+	int err;
 
-	token = rtas_token("ibm,get-system-parameter");
-	if (token == RTAS_UNKNOWN_SERVICE)
-		return -EINVAL;
-
-	local_buffer = kmalloc(sizeof(*local_buffer), GFP_KERNEL);
-	if (!local_buffer)
+	buf = papr_sysparm_buf_alloc();
+	if (!buf)
 		return -ENOMEM;
 
-	do {
-		spin_lock(&rtas_data_buf_lock);
-		memset(rtas_data_buf, 0, sizeof(*local_buffer));
-		rc = rtas_call(token, 3, 1, NULL, SPLPAR_LPAR_NAME_TOKEN,
-			       __pa(rtas_data_buf), sizeof(*local_buffer));
-		if (!rc)
-			memcpy(local_buffer->raw_buffer, rtas_data_buf,
-			       sizeof(local_buffer->raw_buffer));
-		spin_unlock(&rtas_data_buf_lock);
-	} while (rtas_busy_delay(rc));
+	err = papr_sysparm_get(PAPR_SYSPARM_LPAR_NAME, buf);
+	if (!err)
+		seq_printf(m, "partition_name=%s\n", buf->val);
 
-	if (!rc) {
-		/* Force end of string */
-		len = min((int) be16_to_cpu(local_buffer->len),
-			  (int) sizeof(local_buffer->name)-1);
-		local_buffer->name[len] = '\0';
-
-		seq_printf(m, "partition_name=%s\n", local_buffer->name);
-	} else
-		rc = -ENODATA;
-
-	kfree(local_buffer);
-	return rc;
+	papr_sysparm_buf_free(buf);
+	return err;
 }
 
 /*
@@ -397,7 +361,6 @@ static void read_lpar_name(struct seq_file *m)
 		pr_err_once("Error can't get the LPAR name");
 }
 
-#define SPLPAR_CHARACTERISTICS_TOKEN 20
 #define SPLPAR_MAXLENGTH 1026*(sizeof(char))
 
 /*
@@ -408,45 +371,25 @@ static void read_lpar_name(struct seq_file *m)
  */
 static void parse_system_parameter_string(struct seq_file *m)
 {
-	const s32 token = rtas_token("ibm,get-system-parameter");
-	int call_status;
+	struct papr_sysparm_buf *buf;
 
-	unsigned char *local_buffer = kmalloc(SPLPAR_MAXLENGTH, GFP_KERNEL);
-	if (!local_buffer) {
-		printk(KERN_ERR "%s %s kmalloc failure at line %d\n",
-		       __FILE__, __func__, __LINE__);
+	buf = papr_sysparm_buf_alloc();
+	if (!buf)
 		return;
-	}
 
-	do {
-		spin_lock(&rtas_data_buf_lock);
-		memset(rtas_data_buf, 0, SPLPAR_MAXLENGTH);
-		call_status = rtas_call(token, 3, 1, NULL, SPLPAR_CHARACTERISTICS_TOKEN,
-					__pa(rtas_data_buf), RTAS_DATA_BUF_SIZE);
-		memcpy(local_buffer, rtas_data_buf, SPLPAR_MAXLENGTH);
-		local_buffer[SPLPAR_MAXLENGTH - 1] = '\0';
-		spin_unlock(&rtas_data_buf_lock);
-	} while (rtas_busy_delay(call_status));
-
-	if (call_status != 0) {
-		printk(KERN_INFO
-		       "%s %s Error calling get-system-parameter (0x%x)\n",
-		       __FILE__, __func__, call_status);
+	if (papr_sysparm_get(PAPR_SYSPARM_SHARED_PROC_LPAR_ATTRS, buf)) {
+		goto out_free;
 	} else {
+		const char *local_buffer;
 		int splpar_strlen;
 		int idx, w_idx;
 		char *workbuffer = kzalloc(SPLPAR_MAXLENGTH, GFP_KERNEL);
-		if (!workbuffer) {
-			printk(KERN_ERR "%s %s kmalloc failure at line %d\n",
-			       __FILE__, __func__, __LINE__);
-			kfree(local_buffer);
-			return;
-		}
-#ifdef LPARCFG_DEBUG
-		printk(KERN_INFO "success calling get-system-parameter\n");
-#endif
-		splpar_strlen = local_buffer[0] * 256 + local_buffer[1];
-		local_buffer += 2;	/* step over strlen value */
+
+		if (!workbuffer)
+			goto out_free;
+
+		splpar_strlen = be16_to_cpu(buf->len);
+		local_buffer = buf->val;
 
 		w_idx = 0;
 		idx = 0;
@@ -480,7 +423,8 @@ static void parse_system_parameter_string(struct seq_file *m)
 		kfree(workbuffer);
 		local_buffer -= 2;	/* back up over strlen value */
 	}
-	kfree(local_buffer);
+out_free:
+	papr_sysparm_buf_free(buf);
 }
 
 /* Return the number of processors in the system.

-- 
2.39.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 16/19] powerpc/pseries/hv-24x7: convert to papr_sysparm API
  2023-02-06 18:54 ` Nathan Lynch via B4 Submission Endpoint
@ 2023-02-06 18:54   ` Nathan Lynch
  -1 siblings, 0 replies; 50+ messages in thread
From: Nathan Lynch via B4 Submission Endpoint @ 2023-02-06 18:54 UTC (permalink / raw)
  To: Michael Ellerman, Nicholas Piggin, Christophe Leroy, Kajol Jain,
	Laurent Dufour, Mahesh J Salgaonkar, Andrew Donnellan,
	Nick Child
  Cc: Nathan Lynch, linuxppc-dev

From: Nathan Lynch <nathanl@linux.ibm.com>

The new papr_sysparm API handles the details of system parameter
retrieval. Use that instead of open-coding the RTAS call, work area
management, and retries.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
---
 arch/powerpc/perf/hv-24x7.c | 37 +++++++++++++++----------------------
 1 file changed, 15 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c
index fcfebf5bd378..f388b984a336 100644
--- a/arch/powerpc/perf/hv-24x7.c
+++ b/arch/powerpc/perf/hv-24x7.c
@@ -18,6 +18,7 @@
 #include <asm/firmware.h>
 #include <asm/hvcall.h>
 #include <asm/io.h>
+#include <asm/papr-sysparm.h>
 #include <linux/byteorder/generic.h>
 
 #include <asm/rtas.h>
@@ -66,8 +67,6 @@ static bool is_physical_domain(unsigned int domain)
  * Refer PAPR+ document to get parameter token value as '43'.
  */
 
-#define PROCESSOR_MODULE_INFO   43
-
 static u32 phys_sockets;	/* Physical sockets */
 static u32 phys_chipspersocket;	/* Physical chips per socket*/
 static u32 phys_coresperchip; /* Physical cores per chip */
@@ -79,8 +78,7 @@ static u32 phys_coresperchip; /* Physical cores per chip */
  */
 void read_24x7_sys_info(void)
 {
-	const s32 token = rtas_token("ibm,get-system-parameter");
-	int call_status;
+	struct papr_sysparm_buf *buf;
 
 	/*
 	 * Making system parameter: chips and sockets and cores per chip
@@ -90,27 +88,22 @@ void read_24x7_sys_info(void)
 	phys_chipspersocket = 1;
 	phys_coresperchip = 1;
 
-	do {
-		spin_lock(&rtas_data_buf_lock);
-		call_status = rtas_call(token, 3, 1, NULL, PROCESSOR_MODULE_INFO,
-					__pa(rtas_data_buf), RTAS_DATA_BUF_SIZE);
-		if (call_status == 0) {
-			int ntypes = be16_to_cpup((__be16 *)&rtas_data_buf[2]);
-			int len = be16_to_cpup((__be16 *)&rtas_data_buf[0]);
+	buf = papr_sysparm_buf_alloc();
+	if (!buf)
+		return;
 
-			if (len >= 8 && ntypes != 0) {
-				phys_sockets = be16_to_cpup((__be16 *)&rtas_data_buf[4]);
-				phys_chipspersocket = be16_to_cpup((__be16 *)&rtas_data_buf[6]);
-				phys_coresperchip = be16_to_cpup((__be16 *)&rtas_data_buf[8]);
-			}
+	if (!papr_sysparm_get(PAPR_SYSPARM_PROC_MODULE_INFO, buf)) {
+		int ntypes = be16_to_cpup((__be16 *)&buf->val[0]);
+		int len = be16_to_cpu(buf->len);
+
+		if (len >= 8 && ntypes != 0) {
+			phys_sockets = be16_to_cpup((__be16 *)&buf->val[2]);
+			phys_chipspersocket = be16_to_cpup((__be16 *)&buf->val[4]);
+			phys_coresperchip = be16_to_cpup((__be16 *)&buf->val[6]);
 		}
-		spin_unlock(&rtas_data_buf_lock);
-	} while (rtas_busy_delay(call_status));
-
-	if (call_status != 0) {
-		pr_err("Error calling get-system-parameter %d\n",
-		       call_status);
 	}
+
+	papr_sysparm_buf_free(buf);
 }
 
 /* Domains for which more than one result element are returned for each event. */

-- 
2.39.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 16/19] powerpc/pseries/hv-24x7: convert to papr_sysparm API
@ 2023-02-06 18:54   ` Nathan Lynch
  0 siblings, 0 replies; 50+ messages in thread
From: Nathan Lynch @ 2023-02-06 18:54 UTC (permalink / raw)
  To: Michael Ellerman, Nicholas Piggin, Christophe Leroy, Kajol Jain,
	Laurent Dufour, Mahesh J Salgaonkar, Andrew Donnellan,
	Nick Child
  Cc: linuxppc-dev, Nathan Lynch

The new papr_sysparm API handles the details of system parameter
retrieval. Use that instead of open-coding the RTAS call, work area
management, and retries.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
---
 arch/powerpc/perf/hv-24x7.c | 37 +++++++++++++++----------------------
 1 file changed, 15 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c
index fcfebf5bd378..f388b984a336 100644
--- a/arch/powerpc/perf/hv-24x7.c
+++ b/arch/powerpc/perf/hv-24x7.c
@@ -18,6 +18,7 @@
 #include <asm/firmware.h>
 #include <asm/hvcall.h>
 #include <asm/io.h>
+#include <asm/papr-sysparm.h>
 #include <linux/byteorder/generic.h>
 
 #include <asm/rtas.h>
@@ -66,8 +67,6 @@ static bool is_physical_domain(unsigned int domain)
  * Refer PAPR+ document to get parameter token value as '43'.
  */
 
-#define PROCESSOR_MODULE_INFO   43
-
 static u32 phys_sockets;	/* Physical sockets */
 static u32 phys_chipspersocket;	/* Physical chips per socket*/
 static u32 phys_coresperchip; /* Physical cores per chip */
@@ -79,8 +78,7 @@ static u32 phys_coresperchip; /* Physical cores per chip */
  */
 void read_24x7_sys_info(void)
 {
-	const s32 token = rtas_token("ibm,get-system-parameter");
-	int call_status;
+	struct papr_sysparm_buf *buf;
 
 	/*
 	 * Making system parameter: chips and sockets and cores per chip
@@ -90,27 +88,22 @@ void read_24x7_sys_info(void)
 	phys_chipspersocket = 1;
 	phys_coresperchip = 1;
 
-	do {
-		spin_lock(&rtas_data_buf_lock);
-		call_status = rtas_call(token, 3, 1, NULL, PROCESSOR_MODULE_INFO,
-					__pa(rtas_data_buf), RTAS_DATA_BUF_SIZE);
-		if (call_status == 0) {
-			int ntypes = be16_to_cpup((__be16 *)&rtas_data_buf[2]);
-			int len = be16_to_cpup((__be16 *)&rtas_data_buf[0]);
+	buf = papr_sysparm_buf_alloc();
+	if (!buf)
+		return;
 
-			if (len >= 8 && ntypes != 0) {
-				phys_sockets = be16_to_cpup((__be16 *)&rtas_data_buf[4]);
-				phys_chipspersocket = be16_to_cpup((__be16 *)&rtas_data_buf[6]);
-				phys_coresperchip = be16_to_cpup((__be16 *)&rtas_data_buf[8]);
-			}
+	if (!papr_sysparm_get(PAPR_SYSPARM_PROC_MODULE_INFO, buf)) {
+		int ntypes = be16_to_cpup((__be16 *)&buf->val[0]);
+		int len = be16_to_cpu(buf->len);
+
+		if (len >= 8 && ntypes != 0) {
+			phys_sockets = be16_to_cpup((__be16 *)&buf->val[2]);
+			phys_chipspersocket = be16_to_cpup((__be16 *)&buf->val[4]);
+			phys_coresperchip = be16_to_cpup((__be16 *)&buf->val[6]);
 		}
-		spin_unlock(&rtas_data_buf_lock);
-	} while (rtas_busy_delay(call_status));
-
-	if (call_status != 0) {
-		pr_err("Error calling get-system-parameter %d\n",
-		       call_status);
 	}
+
+	papr_sysparm_buf_free(buf);
 }
 
 /* Domains for which more than one result element are returned for each event. */

-- 
2.39.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 17/19] powerpc/pseries/lpar: convert to papr_sysparm API
  2023-02-06 18:54 ` Nathan Lynch via B4 Submission Endpoint
@ 2023-02-06 18:54   ` Nathan Lynch via B4 Submission Endpoint
  -1 siblings, 0 replies; 50+ messages in thread
From: Nathan Lynch @ 2023-02-06 18:54 UTC (permalink / raw)
  To: Michael Ellerman, Nicholas Piggin, Christophe Leroy, Kajol Jain,
	Laurent Dufour, Mahesh J Salgaonkar, Andrew Donnellan,
	Nick Child
  Cc: linuxppc-dev, Nathan Lynch

Convert the TLB block invalidate characteristics discovery to the new
papr_sysparm API. This occurs too early in boot to use
papr_sysparm_buf_alloc(), so use a static buffer.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
---
 arch/powerpc/platforms/pseries/lpar.c | 37 +++++++++--------------------------
 1 file changed, 9 insertions(+), 28 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c
index 6597b2126ebb..2eab323f6970 100644
--- a/arch/powerpc/platforms/pseries/lpar.c
+++ b/arch/powerpc/platforms/pseries/lpar.c
@@ -32,6 +32,7 @@
 #include <asm/iommu.h>
 #include <asm/tlb.h>
 #include <asm/cputable.h>
+#include <asm/papr-sysparm.h>
 #include <asm/udbg.h>
 #include <asm/smp.h>
 #include <asm/trace.h>
@@ -1469,8 +1470,6 @@ static inline void __init check_lp_set_hblkrm(unsigned int lp,
 	}
 }
 
-#define SPLPAR_TLB_BIC_TOKEN		50
-
 /*
  * The size of the TLB Block Invalidate Characteristics is variable. But at the
  * maximum it will be the number of possible page sizes *2 + 10 bytes.
@@ -1481,42 +1480,24 @@ static inline void __init check_lp_set_hblkrm(unsigned int lp,
 
 void __init pseries_lpar_read_hblkrm_characteristics(void)
 {
-	const s32 token = rtas_token("ibm,get-system-parameter");
-	unsigned char local_buffer[SPLPAR_TLB_BIC_MAXLENGTH];
-	int call_status, len, idx, bpsize;
+	static struct papr_sysparm_buf buf __initdata;
+	int len, idx, bpsize;
 
 	if (!firmware_has_feature(FW_FEATURE_BLOCK_REMOVE))
 		return;
 
-	do {
-		spin_lock(&rtas_data_buf_lock);
-		memset(rtas_data_buf, 0, RTAS_DATA_BUF_SIZE);
-		call_status = rtas_call(token, 3, 1, NULL, SPLPAR_TLB_BIC_TOKEN,
-					__pa(rtas_data_buf), RTAS_DATA_BUF_SIZE);
-		memcpy(local_buffer, rtas_data_buf, SPLPAR_TLB_BIC_MAXLENGTH);
-		local_buffer[SPLPAR_TLB_BIC_MAXLENGTH - 1] = '\0';
-		spin_unlock(&rtas_data_buf_lock);
-	} while (rtas_busy_delay(call_status));
-
-	if (call_status != 0) {
-		pr_warn("%s %s Error calling get-system-parameter (0x%x)\n",
-			__FILE__, __func__, call_status);
+	if (papr_sysparm_get(PAPR_SYSPARM_TLB_BLOCK_INVALIDATE_ATTRS, &buf))
 		return;
-	}
 
-	/*
-	 * The first two (2) bytes of the data in the buffer are the length of
-	 * the returned data, not counting these first two (2) bytes.
-	 */
-	len = be16_to_cpu(*((u16 *)local_buffer)) + 2;
+	len = be16_to_cpu(buf.len);
 	if (len > SPLPAR_TLB_BIC_MAXLENGTH) {
 		pr_warn("%s too large returned buffer %d", __func__, len);
 		return;
 	}
 
-	idx = 2;
+	idx = 0;
 	while (idx < len) {
-		u8 block_shift = local_buffer[idx++];
+		u8 block_shift = buf.val[idx++];
 		u32 block_size;
 		unsigned int npsize;
 
@@ -1525,9 +1506,9 @@ void __init pseries_lpar_read_hblkrm_characteristics(void)
 
 		block_size = 1 << block_shift;
 
-		for (npsize = local_buffer[idx++];
+		for (npsize = buf.val[idx++];
 		     npsize > 0 && idx < len; npsize--)
-			check_lp_set_hblkrm((unsigned int) local_buffer[idx++],
+			check_lp_set_hblkrm((unsigned int)buf.val[idx++],
 					    block_size);
 	}
 

-- 
2.39.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 17/19] powerpc/pseries/lpar: convert to papr_sysparm API
@ 2023-02-06 18:54   ` Nathan Lynch via B4 Submission Endpoint
  0 siblings, 0 replies; 50+ messages in thread
From: Nathan Lynch via B4 Submission Endpoint @ 2023-02-06 18:54 UTC (permalink / raw)
  To: Michael Ellerman, Nicholas Piggin, Christophe Leroy, Kajol Jain,
	Laurent Dufour, Mahesh J Salgaonkar, Andrew Donnellan,
	Nick Child
  Cc: Nathan Lynch, linuxppc-dev

From: Nathan Lynch <nathanl@linux.ibm.com>

Convert the TLB block invalidate characteristics discovery to the new
papr_sysparm API. This occurs too early in boot to use
papr_sysparm_buf_alloc(), so use a static buffer.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
---
 arch/powerpc/platforms/pseries/lpar.c | 37 +++++++++--------------------------
 1 file changed, 9 insertions(+), 28 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c
index 6597b2126ebb..2eab323f6970 100644
--- a/arch/powerpc/platforms/pseries/lpar.c
+++ b/arch/powerpc/platforms/pseries/lpar.c
@@ -32,6 +32,7 @@
 #include <asm/iommu.h>
 #include <asm/tlb.h>
 #include <asm/cputable.h>
+#include <asm/papr-sysparm.h>
 #include <asm/udbg.h>
 #include <asm/smp.h>
 #include <asm/trace.h>
@@ -1469,8 +1470,6 @@ static inline void __init check_lp_set_hblkrm(unsigned int lp,
 	}
 }
 
-#define SPLPAR_TLB_BIC_TOKEN		50
-
 /*
  * The size of the TLB Block Invalidate Characteristics is variable. But at the
  * maximum it will be the number of possible page sizes *2 + 10 bytes.
@@ -1481,42 +1480,24 @@ static inline void __init check_lp_set_hblkrm(unsigned int lp,
 
 void __init pseries_lpar_read_hblkrm_characteristics(void)
 {
-	const s32 token = rtas_token("ibm,get-system-parameter");
-	unsigned char local_buffer[SPLPAR_TLB_BIC_MAXLENGTH];
-	int call_status, len, idx, bpsize;
+	static struct papr_sysparm_buf buf __initdata;
+	int len, idx, bpsize;
 
 	if (!firmware_has_feature(FW_FEATURE_BLOCK_REMOVE))
 		return;
 
-	do {
-		spin_lock(&rtas_data_buf_lock);
-		memset(rtas_data_buf, 0, RTAS_DATA_BUF_SIZE);
-		call_status = rtas_call(token, 3, 1, NULL, SPLPAR_TLB_BIC_TOKEN,
-					__pa(rtas_data_buf), RTAS_DATA_BUF_SIZE);
-		memcpy(local_buffer, rtas_data_buf, SPLPAR_TLB_BIC_MAXLENGTH);
-		local_buffer[SPLPAR_TLB_BIC_MAXLENGTH - 1] = '\0';
-		spin_unlock(&rtas_data_buf_lock);
-	} while (rtas_busy_delay(call_status));
-
-	if (call_status != 0) {
-		pr_warn("%s %s Error calling get-system-parameter (0x%x)\n",
-			__FILE__, __func__, call_status);
+	if (papr_sysparm_get(PAPR_SYSPARM_TLB_BLOCK_INVALIDATE_ATTRS, &buf))
 		return;
-	}
 
-	/*
-	 * The first two (2) bytes of the data in the buffer are the length of
-	 * the returned data, not counting these first two (2) bytes.
-	 */
-	len = be16_to_cpu(*((u16 *)local_buffer)) + 2;
+	len = be16_to_cpu(buf.len);
 	if (len > SPLPAR_TLB_BIC_MAXLENGTH) {
 		pr_warn("%s too large returned buffer %d", __func__, len);
 		return;
 	}
 
-	idx = 2;
+	idx = 0;
 	while (idx < len) {
-		u8 block_shift = local_buffer[idx++];
+		u8 block_shift = buf.val[idx++];
 		u32 block_size;
 		unsigned int npsize;
 
@@ -1525,9 +1506,9 @@ void __init pseries_lpar_read_hblkrm_characteristics(void)
 
 		block_size = 1 << block_shift;
 
-		for (npsize = local_buffer[idx++];
+		for (npsize = buf.val[idx++];
 		     npsize > 0 && idx < len; npsize--)
-			check_lp_set_hblkrm((unsigned int) local_buffer[idx++],
+			check_lp_set_hblkrm((unsigned int)buf.val[idx++],
 					    block_size);
 	}
 

-- 
2.39.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 18/19] powerpc/rtas: introduce rtas_function_token() API
  2023-02-06 18:54 ` Nathan Lynch via B4 Submission Endpoint
@ 2023-02-06 18:54   ` Nathan Lynch via B4 Submission Endpoint
  -1 siblings, 0 replies; 50+ messages in thread
From: Nathan Lynch @ 2023-02-06 18:54 UTC (permalink / raw)
  To: Michael Ellerman, Nicholas Piggin, Christophe Leroy, Kajol Jain,
	Laurent Dufour, Mahesh J Salgaonkar, Andrew Donnellan,
	Nick Child
  Cc: linuxppc-dev, Nathan Lynch

Users of rtas_token() supply a string argument that can't be validated
at build time. A typo or misspelling has to be caught by inspection or
by observing wrong behavior at runtime.

Since the core RTAS code now has consolidated the names of all
possible RTAS functions and mapped them to their tokens, token lookup
can be implemented using symbolic constants to index a static array.

So introduce rtas_function_token(), a replacement API which does that,
along with a rtas_service_present()-equivalent helper,
rtas_function_implemented(). Callers supply an opaque predefined
function handle which is used internally to index the function
table. Typos or other inappropriate arguments yield build errors, and
the function handle is a type that can't be easily confused with RTAS
tokens or other integer types.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
---
 arch/powerpc/include/asm/rtas.h | 98 +++++++++++++++++++++++++++++++++++++++++
 arch/powerpc/kernel/rtas.c      | 22 ++++++++-
 2 files changed, 119 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
index 14fe79217c26..fe400438c1fb 100644
--- a/arch/powerpc/include/asm/rtas.h
+++ b/arch/powerpc/include/asm/rtas.h
@@ -103,6 +103,99 @@ enum rtas_function_index {
 	rtas_fnidx(WRITE_PCI_CONFIG),
 };
 
+/*
+ * Opaque handle for client code to refer to RTAS functions. All valid
+ * function handles are build-time constants prefixed with RTAS_FN_.
+ */
+typedef struct {
+	const enum rtas_function_index index;
+} rtas_fn_handle_t;
+
+#define rtas_fn_handle(x_) ((const rtas_fn_handle_t) { .index = rtas_fnidx(x_), })
+
+#define RTAS_FN_CHECK_EXCEPTION                   rtas_fn_handle(CHECK_EXCEPTION)
+#define RTAS_FN_DISPLAY_CHARACTER                 rtas_fn_handle(DISPLAY_CHARACTER)
+#define RTAS_FN_EVENT_SCAN                        rtas_fn_handle(EVENT_SCAN)
+#define RTAS_FN_FREEZE_TIME_BASE                  rtas_fn_handle(FREEZE_TIME_BASE)
+#define RTAS_FN_GET_POWER_LEVEL                   rtas_fn_handle(GET_POWER_LEVEL)
+#define RTAS_FN_GET_SENSOR_STATE                  rtas_fn_handle(GET_SENSOR_STATE)
+#define RTAS_FN_GET_TERM_CHAR                     rtas_fn_handle(GET_TERM_CHAR)
+#define RTAS_FN_GET_TIME_OF_DAY                   rtas_fn_handle(GET_TIME_OF_DAY)
+#define RTAS_FN_IBM_ACTIVATE_FIRMWARE             rtas_fn_handle(IBM_ACTIVATE_FIRMWARE)
+#define RTAS_FN_IBM_CBE_START_PTCAL               rtas_fn_handle(IBM_CBE_START_PTCAL)
+#define RTAS_FN_IBM_CBE_STOP_PTCAL                rtas_fn_handle(IBM_CBE_STOP_PTCAL)
+#define RTAS_FN_IBM_CHANGE_MSI                    rtas_fn_handle(IBM_CHANGE_MSI)
+#define RTAS_FN_IBM_CLOSE_ERRINJCT                rtas_fn_handle(IBM_CLOSE_ERRINJCT)
+#define RTAS_FN_IBM_CONFIGURE_BRIDGE              rtas_fn_handle(IBM_CONFIGURE_BRIDGE)
+#define RTAS_FN_IBM_CONFIGURE_CONNECTOR           rtas_fn_handle(IBM_CONFIGURE_CONNECTOR)
+#define RTAS_FN_IBM_CONFIGURE_KERNEL_DUMP         rtas_fn_handle(IBM_CONFIGURE_KERNEL_DUMP)
+#define RTAS_FN_IBM_CONFIGURE_PE                  rtas_fn_handle(IBM_CONFIGURE_PE)
+#define RTAS_FN_IBM_CREATE_PE_DMA_WINDOW          rtas_fn_handle(IBM_CREATE_PE_DMA_WINDOW)
+#define RTAS_FN_IBM_DISPLAY_MESSAGE               rtas_fn_handle(IBM_DISPLAY_MESSAGE)
+#define RTAS_FN_IBM_ERRINJCT                      rtas_fn_handle(IBM_ERRINJCT)
+#define RTAS_FN_IBM_EXTI2C                        rtas_fn_handle(IBM_EXTI2C)
+#define RTAS_FN_IBM_GET_CONFIG_ADDR_INFO          rtas_fn_handle(IBM_GET_CONFIG_ADDR_INFO)
+#define RTAS_FN_IBM_GET_CONFIG_ADDR_INFO2         rtas_fn_handle(IBM_GET_CONFIG_ADDR_INFO2)
+#define RTAS_FN_IBM_GET_DYNAMIC_SENSOR_STATE      rtas_fn_handle(IBM_GET_DYNAMIC_SENSOR_STATE)
+#define RTAS_FN_IBM_GET_INDICES                   rtas_fn_handle(IBM_GET_INDICES)
+#define RTAS_FN_IBM_GET_RIO_TOPOLOGY              rtas_fn_handle(IBM_GET_RIO_TOPOLOGY)
+#define RTAS_FN_IBM_GET_SYSTEM_PARAMETER          rtas_fn_handle(IBM_GET_SYSTEM_PARAMETER)
+#define RTAS_FN_IBM_GET_VPD                       rtas_fn_handle(IBM_GET_VPD)
+#define RTAS_FN_IBM_GET_XIVE                      rtas_fn_handle(IBM_GET_XIVE)
+#define RTAS_FN_IBM_INT_OFF                       rtas_fn_handle(IBM_INT_OFF)
+#define RTAS_FN_IBM_INT_ON                        rtas_fn_handle(IBM_INT_ON)
+#define RTAS_FN_IBM_IO_QUIESCE_ACK                rtas_fn_handle(IBM_IO_QUIESCE_ACK)
+#define RTAS_FN_IBM_LPAR_PERFTOOLS                rtas_fn_handle(IBM_LPAR_PERFTOOLS)
+#define RTAS_FN_IBM_MANAGE_FLASH_IMAGE            rtas_fn_handle(IBM_MANAGE_FLASH_IMAGE)
+#define RTAS_FN_IBM_MANAGE_STORAGE_PRESERVATION   rtas_fn_handle(IBM_MANAGE_STORAGE_PRESERVATION)
+#define RTAS_FN_IBM_NMI_INTERLOCK                 rtas_fn_handle(IBM_NMI_INTERLOCK)
+#define RTAS_FN_IBM_NMI_REGISTER                  rtas_fn_handle(IBM_NMI_REGISTER)
+#define RTAS_FN_IBM_OPEN_ERRINJCT                 rtas_fn_handle(IBM_OPEN_ERRINJCT)
+#define RTAS_FN_IBM_OPEN_SRIOV_ALLOW_UNFREEZE     rtas_fn_handle(IBM_OPEN_SRIOV_ALLOW_UNFREEZE)
+#define RTAS_FN_IBM_OPEN_SRIOV_MAP_PE_NUMBER      rtas_fn_handle(IBM_OPEN_SRIOV_MAP_PE_NUMBER)
+#define RTAS_FN_IBM_OS_TERM                       rtas_fn_handle(IBM_OS_TERM)
+#define RTAS_FN_IBM_PARTNER_CONTROL               rtas_fn_handle(IBM_PARTNER_CONTROL)
+#define RTAS_FN_IBM_PHYSICAL_ATTESTATION          rtas_fn_handle(IBM_PHYSICAL_ATTESTATION)
+#define RTAS_FN_IBM_PLATFORM_DUMP                 rtas_fn_handle(IBM_PLATFORM_DUMP)
+#define RTAS_FN_IBM_POWER_OFF_UPS                 rtas_fn_handle(IBM_POWER_OFF_UPS)
+#define RTAS_FN_IBM_QUERY_INTERRUPT_SOURCE_NUMBER rtas_fn_handle(IBM_QUERY_INTERRUPT_SOURCE_NUMBER)
+#define RTAS_FN_IBM_QUERY_PE_DMA_WINDOW           rtas_fn_handle(IBM_QUERY_PE_DMA_WINDOW)
+#define RTAS_FN_IBM_READ_PCI_CONFIG               rtas_fn_handle(IBM_READ_PCI_CONFIG)
+#define RTAS_FN_IBM_READ_SLOT_RESET_STATE         rtas_fn_handle(IBM_READ_SLOT_RESET_STATE)
+#define RTAS_FN_IBM_READ_SLOT_RESET_STATE2        rtas_fn_handle(IBM_READ_SLOT_RESET_STATE2)
+#define RTAS_FN_IBM_REMOVE_PE_DMA_WINDOW          rtas_fn_handle(IBM_REMOVE_PE_DMA_WINDOW)
+#define RTAS_FN_IBM_RESET_PE_DMA_WINDOWS          rtas_fn_handle(IBM_RESET_PE_DMA_WINDOWS)
+#define RTAS_FN_IBM_SCAN_LOG_DUMP                 rtas_fn_handle(IBM_SCAN_LOG_DUMP)
+#define RTAS_FN_IBM_SET_DYNAMIC_INDICATOR         rtas_fn_handle(IBM_SET_DYNAMIC_INDICATOR)
+#define RTAS_FN_IBM_SET_EEH_OPTION                rtas_fn_handle(IBM_SET_EEH_OPTION)
+#define RTAS_FN_IBM_SET_SLOT_RESET                rtas_fn_handle(IBM_SET_SLOT_RESET)
+#define RTAS_FN_IBM_SET_SYSTEM_PARAMETER          rtas_fn_handle(IBM_SET_SYSTEM_PARAMETER)
+#define RTAS_FN_IBM_SET_XIVE                      rtas_fn_handle(IBM_SET_XIVE)
+#define RTAS_FN_IBM_SLOT_ERROR_DETAIL             rtas_fn_handle(IBM_SLOT_ERROR_DETAIL)
+#define RTAS_FN_IBM_SUSPEND_ME                    rtas_fn_handle(IBM_SUSPEND_ME)
+#define RTAS_FN_IBM_TUNE_DMA_PARMS                rtas_fn_handle(IBM_TUNE_DMA_PARMS)
+#define RTAS_FN_IBM_UPDATE_FLASH_64_AND_REBOOT    rtas_fn_handle(IBM_UPDATE_FLASH_64_AND_REBOOT)
+#define RTAS_FN_IBM_UPDATE_NODES                  rtas_fn_handle(IBM_UPDATE_NODES)
+#define RTAS_FN_IBM_UPDATE_PROPERTIES             rtas_fn_handle(IBM_UPDATE_PROPERTIES)
+#define RTAS_FN_IBM_VALIDATE_FLASH_IMAGE          rtas_fn_handle(IBM_VALIDATE_FLASH_IMAGE)
+#define RTAS_FN_IBM_WRITE_PCI_CONFIG              rtas_fn_handle(IBM_WRITE_PCI_CONFIG)
+#define RTAS_FN_NVRAM_FETCH                       rtas_fn_handle(NVRAM_FETCH)
+#define RTAS_FN_NVRAM_STORE                       rtas_fn_handle(NVRAM_STORE)
+#define RTAS_FN_POWER_OFF                         rtas_fn_handle(POWER_OFF)
+#define RTAS_FN_PUT_TERM_CHAR                     rtas_fn_handle(PUT_TERM_CHAR)
+#define RTAS_FN_QUERY_CPU_STOPPED_STATE           rtas_fn_handle(QUERY_CPU_STOPPED_STATE)
+#define RTAS_FN_READ_PCI_CONFIG                   rtas_fn_handle(READ_PCI_CONFIG)
+#define RTAS_FN_RTAS_LAST_ERROR                   rtas_fn_handle(RTAS_LAST_ERROR)
+#define RTAS_FN_SET_INDICATOR                     rtas_fn_handle(SET_INDICATOR)
+#define RTAS_FN_SET_POWER_LEVEL                   rtas_fn_handle(SET_POWER_LEVEL)
+#define RTAS_FN_SET_TIME_FOR_POWER_ON             rtas_fn_handle(SET_TIME_FOR_POWER_ON)
+#define RTAS_FN_SET_TIME_OF_DAY                   rtas_fn_handle(SET_TIME_OF_DAY)
+#define RTAS_FN_START_CPU                         rtas_fn_handle(START_CPU)
+#define RTAS_FN_STOP_SELF                         rtas_fn_handle(STOP_SELF)
+#define RTAS_FN_SYSTEM_REBOOT                     rtas_fn_handle(SYSTEM_REBOOT)
+#define RTAS_FN_THAW_TIME_BASE                    rtas_fn_handle(THAW_TIME_BASE)
+#define RTAS_FN_WRITE_PCI_CONFIG                  rtas_fn_handle(WRITE_PCI_CONFIG)
+
 #define RTAS_UNKNOWN_SERVICE (-1)
 #define RTAS_INSTANTIATE_MAX (1ULL<<30) /* Don't instantiate rtas at/above this value */
 
@@ -309,6 +402,11 @@ extern void (*rtas_flash_term_hook)(int);
 
 extern struct rtas_t rtas;
 
+s32 rtas_function_token(const rtas_fn_handle_t handle);
+static inline bool rtas_function_implemented(const rtas_fn_handle_t handle)
+{
+	return rtas_function_token(handle) != RTAS_UNKNOWN_SERVICE;
+}
 extern int rtas_token(const char *service);
 extern int rtas_service_present(const char *service);
 extern int rtas_call(int token, int, int, int *, ...);
diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index 41c430dc40c2..17e59306ce63 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -453,6 +453,26 @@ static struct rtas_function rtas_function_table[] __ro_after_init = {
 	},
 };
 
+/**
+ * rtas_function_token() - RTAS function token lookup.
+ * @handle: Function handle, e.g. RTAS_FN_EVENT_SCAN.
+ *
+ * Context: Any context.
+ * Return: the token value for the function if implemented by this platform,
+ *         otherwise RTAS_UNKNOWN_SERVICE.
+ */
+s32 rtas_function_token(const rtas_fn_handle_t handle)
+{
+	const size_t index = handle.index;
+	const bool out_of_bounds = index >= ARRAY_SIZE(rtas_function_table);
+
+	if (WARN_ONCE(out_of_bounds, "invalid function index %zu", index))
+		return RTAS_UNKNOWN_SERVICE;
+
+	return rtas_function_table[index].token;
+}
+EXPORT_SYMBOL_GPL(rtas_function_token);
+
 static int rtas_function_cmp(const void *a, const void *b)
 {
 	const struct rtas_function *f1 = a;
@@ -1011,7 +1031,7 @@ static int ibm_errinjct_token;
  * @....: List of @nargs input parameters.
  *
  * Invokes the RTAS function indicated by @token, which the caller
- * should obtain via rtas_token().
+ * should obtain via rtas_function_token().
  *
  * The @nargs and @nret arguments must match the number of input and
  * output parameters specified for the RTAS function.

-- 
2.39.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 18/19] powerpc/rtas: introduce rtas_function_token() API
@ 2023-02-06 18:54   ` Nathan Lynch via B4 Submission Endpoint
  0 siblings, 0 replies; 50+ messages in thread
From: Nathan Lynch via B4 Submission Endpoint @ 2023-02-06 18:54 UTC (permalink / raw)
  To: Michael Ellerman, Nicholas Piggin, Christophe Leroy, Kajol Jain,
	Laurent Dufour, Mahesh J Salgaonkar, Andrew Donnellan,
	Nick Child
  Cc: Nathan Lynch, linuxppc-dev

From: Nathan Lynch <nathanl@linux.ibm.com>

Users of rtas_token() supply a string argument that can't be validated
at build time. A typo or misspelling has to be caught by inspection or
by observing wrong behavior at runtime.

Since the core RTAS code now has consolidated the names of all
possible RTAS functions and mapped them to their tokens, token lookup
can be implemented using symbolic constants to index a static array.

So introduce rtas_function_token(), a replacement API which does that,
along with a rtas_service_present()-equivalent helper,
rtas_function_implemented(). Callers supply an opaque predefined
function handle which is used internally to index the function
table. Typos or other inappropriate arguments yield build errors, and
the function handle is a type that can't be easily confused with RTAS
tokens or other integer types.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
---
 arch/powerpc/include/asm/rtas.h | 98 +++++++++++++++++++++++++++++++++++++++++
 arch/powerpc/kernel/rtas.c      | 22 ++++++++-
 2 files changed, 119 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
index 14fe79217c26..fe400438c1fb 100644
--- a/arch/powerpc/include/asm/rtas.h
+++ b/arch/powerpc/include/asm/rtas.h
@@ -103,6 +103,99 @@ enum rtas_function_index {
 	rtas_fnidx(WRITE_PCI_CONFIG),
 };
 
+/*
+ * Opaque handle for client code to refer to RTAS functions. All valid
+ * function handles are build-time constants prefixed with RTAS_FN_.
+ */
+typedef struct {
+	const enum rtas_function_index index;
+} rtas_fn_handle_t;
+
+#define rtas_fn_handle(x_) ((const rtas_fn_handle_t) { .index = rtas_fnidx(x_), })
+
+#define RTAS_FN_CHECK_EXCEPTION                   rtas_fn_handle(CHECK_EXCEPTION)
+#define RTAS_FN_DISPLAY_CHARACTER                 rtas_fn_handle(DISPLAY_CHARACTER)
+#define RTAS_FN_EVENT_SCAN                        rtas_fn_handle(EVENT_SCAN)
+#define RTAS_FN_FREEZE_TIME_BASE                  rtas_fn_handle(FREEZE_TIME_BASE)
+#define RTAS_FN_GET_POWER_LEVEL                   rtas_fn_handle(GET_POWER_LEVEL)
+#define RTAS_FN_GET_SENSOR_STATE                  rtas_fn_handle(GET_SENSOR_STATE)
+#define RTAS_FN_GET_TERM_CHAR                     rtas_fn_handle(GET_TERM_CHAR)
+#define RTAS_FN_GET_TIME_OF_DAY                   rtas_fn_handle(GET_TIME_OF_DAY)
+#define RTAS_FN_IBM_ACTIVATE_FIRMWARE             rtas_fn_handle(IBM_ACTIVATE_FIRMWARE)
+#define RTAS_FN_IBM_CBE_START_PTCAL               rtas_fn_handle(IBM_CBE_START_PTCAL)
+#define RTAS_FN_IBM_CBE_STOP_PTCAL                rtas_fn_handle(IBM_CBE_STOP_PTCAL)
+#define RTAS_FN_IBM_CHANGE_MSI                    rtas_fn_handle(IBM_CHANGE_MSI)
+#define RTAS_FN_IBM_CLOSE_ERRINJCT                rtas_fn_handle(IBM_CLOSE_ERRINJCT)
+#define RTAS_FN_IBM_CONFIGURE_BRIDGE              rtas_fn_handle(IBM_CONFIGURE_BRIDGE)
+#define RTAS_FN_IBM_CONFIGURE_CONNECTOR           rtas_fn_handle(IBM_CONFIGURE_CONNECTOR)
+#define RTAS_FN_IBM_CONFIGURE_KERNEL_DUMP         rtas_fn_handle(IBM_CONFIGURE_KERNEL_DUMP)
+#define RTAS_FN_IBM_CONFIGURE_PE                  rtas_fn_handle(IBM_CONFIGURE_PE)
+#define RTAS_FN_IBM_CREATE_PE_DMA_WINDOW          rtas_fn_handle(IBM_CREATE_PE_DMA_WINDOW)
+#define RTAS_FN_IBM_DISPLAY_MESSAGE               rtas_fn_handle(IBM_DISPLAY_MESSAGE)
+#define RTAS_FN_IBM_ERRINJCT                      rtas_fn_handle(IBM_ERRINJCT)
+#define RTAS_FN_IBM_EXTI2C                        rtas_fn_handle(IBM_EXTI2C)
+#define RTAS_FN_IBM_GET_CONFIG_ADDR_INFO          rtas_fn_handle(IBM_GET_CONFIG_ADDR_INFO)
+#define RTAS_FN_IBM_GET_CONFIG_ADDR_INFO2         rtas_fn_handle(IBM_GET_CONFIG_ADDR_INFO2)
+#define RTAS_FN_IBM_GET_DYNAMIC_SENSOR_STATE      rtas_fn_handle(IBM_GET_DYNAMIC_SENSOR_STATE)
+#define RTAS_FN_IBM_GET_INDICES                   rtas_fn_handle(IBM_GET_INDICES)
+#define RTAS_FN_IBM_GET_RIO_TOPOLOGY              rtas_fn_handle(IBM_GET_RIO_TOPOLOGY)
+#define RTAS_FN_IBM_GET_SYSTEM_PARAMETER          rtas_fn_handle(IBM_GET_SYSTEM_PARAMETER)
+#define RTAS_FN_IBM_GET_VPD                       rtas_fn_handle(IBM_GET_VPD)
+#define RTAS_FN_IBM_GET_XIVE                      rtas_fn_handle(IBM_GET_XIVE)
+#define RTAS_FN_IBM_INT_OFF                       rtas_fn_handle(IBM_INT_OFF)
+#define RTAS_FN_IBM_INT_ON                        rtas_fn_handle(IBM_INT_ON)
+#define RTAS_FN_IBM_IO_QUIESCE_ACK                rtas_fn_handle(IBM_IO_QUIESCE_ACK)
+#define RTAS_FN_IBM_LPAR_PERFTOOLS                rtas_fn_handle(IBM_LPAR_PERFTOOLS)
+#define RTAS_FN_IBM_MANAGE_FLASH_IMAGE            rtas_fn_handle(IBM_MANAGE_FLASH_IMAGE)
+#define RTAS_FN_IBM_MANAGE_STORAGE_PRESERVATION   rtas_fn_handle(IBM_MANAGE_STORAGE_PRESERVATION)
+#define RTAS_FN_IBM_NMI_INTERLOCK                 rtas_fn_handle(IBM_NMI_INTERLOCK)
+#define RTAS_FN_IBM_NMI_REGISTER                  rtas_fn_handle(IBM_NMI_REGISTER)
+#define RTAS_FN_IBM_OPEN_ERRINJCT                 rtas_fn_handle(IBM_OPEN_ERRINJCT)
+#define RTAS_FN_IBM_OPEN_SRIOV_ALLOW_UNFREEZE     rtas_fn_handle(IBM_OPEN_SRIOV_ALLOW_UNFREEZE)
+#define RTAS_FN_IBM_OPEN_SRIOV_MAP_PE_NUMBER      rtas_fn_handle(IBM_OPEN_SRIOV_MAP_PE_NUMBER)
+#define RTAS_FN_IBM_OS_TERM                       rtas_fn_handle(IBM_OS_TERM)
+#define RTAS_FN_IBM_PARTNER_CONTROL               rtas_fn_handle(IBM_PARTNER_CONTROL)
+#define RTAS_FN_IBM_PHYSICAL_ATTESTATION          rtas_fn_handle(IBM_PHYSICAL_ATTESTATION)
+#define RTAS_FN_IBM_PLATFORM_DUMP                 rtas_fn_handle(IBM_PLATFORM_DUMP)
+#define RTAS_FN_IBM_POWER_OFF_UPS                 rtas_fn_handle(IBM_POWER_OFF_UPS)
+#define RTAS_FN_IBM_QUERY_INTERRUPT_SOURCE_NUMBER rtas_fn_handle(IBM_QUERY_INTERRUPT_SOURCE_NUMBER)
+#define RTAS_FN_IBM_QUERY_PE_DMA_WINDOW           rtas_fn_handle(IBM_QUERY_PE_DMA_WINDOW)
+#define RTAS_FN_IBM_READ_PCI_CONFIG               rtas_fn_handle(IBM_READ_PCI_CONFIG)
+#define RTAS_FN_IBM_READ_SLOT_RESET_STATE         rtas_fn_handle(IBM_READ_SLOT_RESET_STATE)
+#define RTAS_FN_IBM_READ_SLOT_RESET_STATE2        rtas_fn_handle(IBM_READ_SLOT_RESET_STATE2)
+#define RTAS_FN_IBM_REMOVE_PE_DMA_WINDOW          rtas_fn_handle(IBM_REMOVE_PE_DMA_WINDOW)
+#define RTAS_FN_IBM_RESET_PE_DMA_WINDOWS          rtas_fn_handle(IBM_RESET_PE_DMA_WINDOWS)
+#define RTAS_FN_IBM_SCAN_LOG_DUMP                 rtas_fn_handle(IBM_SCAN_LOG_DUMP)
+#define RTAS_FN_IBM_SET_DYNAMIC_INDICATOR         rtas_fn_handle(IBM_SET_DYNAMIC_INDICATOR)
+#define RTAS_FN_IBM_SET_EEH_OPTION                rtas_fn_handle(IBM_SET_EEH_OPTION)
+#define RTAS_FN_IBM_SET_SLOT_RESET                rtas_fn_handle(IBM_SET_SLOT_RESET)
+#define RTAS_FN_IBM_SET_SYSTEM_PARAMETER          rtas_fn_handle(IBM_SET_SYSTEM_PARAMETER)
+#define RTAS_FN_IBM_SET_XIVE                      rtas_fn_handle(IBM_SET_XIVE)
+#define RTAS_FN_IBM_SLOT_ERROR_DETAIL             rtas_fn_handle(IBM_SLOT_ERROR_DETAIL)
+#define RTAS_FN_IBM_SUSPEND_ME                    rtas_fn_handle(IBM_SUSPEND_ME)
+#define RTAS_FN_IBM_TUNE_DMA_PARMS                rtas_fn_handle(IBM_TUNE_DMA_PARMS)
+#define RTAS_FN_IBM_UPDATE_FLASH_64_AND_REBOOT    rtas_fn_handle(IBM_UPDATE_FLASH_64_AND_REBOOT)
+#define RTAS_FN_IBM_UPDATE_NODES                  rtas_fn_handle(IBM_UPDATE_NODES)
+#define RTAS_FN_IBM_UPDATE_PROPERTIES             rtas_fn_handle(IBM_UPDATE_PROPERTIES)
+#define RTAS_FN_IBM_VALIDATE_FLASH_IMAGE          rtas_fn_handle(IBM_VALIDATE_FLASH_IMAGE)
+#define RTAS_FN_IBM_WRITE_PCI_CONFIG              rtas_fn_handle(IBM_WRITE_PCI_CONFIG)
+#define RTAS_FN_NVRAM_FETCH                       rtas_fn_handle(NVRAM_FETCH)
+#define RTAS_FN_NVRAM_STORE                       rtas_fn_handle(NVRAM_STORE)
+#define RTAS_FN_POWER_OFF                         rtas_fn_handle(POWER_OFF)
+#define RTAS_FN_PUT_TERM_CHAR                     rtas_fn_handle(PUT_TERM_CHAR)
+#define RTAS_FN_QUERY_CPU_STOPPED_STATE           rtas_fn_handle(QUERY_CPU_STOPPED_STATE)
+#define RTAS_FN_READ_PCI_CONFIG                   rtas_fn_handle(READ_PCI_CONFIG)
+#define RTAS_FN_RTAS_LAST_ERROR                   rtas_fn_handle(RTAS_LAST_ERROR)
+#define RTAS_FN_SET_INDICATOR                     rtas_fn_handle(SET_INDICATOR)
+#define RTAS_FN_SET_POWER_LEVEL                   rtas_fn_handle(SET_POWER_LEVEL)
+#define RTAS_FN_SET_TIME_FOR_POWER_ON             rtas_fn_handle(SET_TIME_FOR_POWER_ON)
+#define RTAS_FN_SET_TIME_OF_DAY                   rtas_fn_handle(SET_TIME_OF_DAY)
+#define RTAS_FN_START_CPU                         rtas_fn_handle(START_CPU)
+#define RTAS_FN_STOP_SELF                         rtas_fn_handle(STOP_SELF)
+#define RTAS_FN_SYSTEM_REBOOT                     rtas_fn_handle(SYSTEM_REBOOT)
+#define RTAS_FN_THAW_TIME_BASE                    rtas_fn_handle(THAW_TIME_BASE)
+#define RTAS_FN_WRITE_PCI_CONFIG                  rtas_fn_handle(WRITE_PCI_CONFIG)
+
 #define RTAS_UNKNOWN_SERVICE (-1)
 #define RTAS_INSTANTIATE_MAX (1ULL<<30) /* Don't instantiate rtas at/above this value */
 
@@ -309,6 +402,11 @@ extern void (*rtas_flash_term_hook)(int);
 
 extern struct rtas_t rtas;
 
+s32 rtas_function_token(const rtas_fn_handle_t handle);
+static inline bool rtas_function_implemented(const rtas_fn_handle_t handle)
+{
+	return rtas_function_token(handle) != RTAS_UNKNOWN_SERVICE;
+}
 extern int rtas_token(const char *service);
 extern int rtas_service_present(const char *service);
 extern int rtas_call(int token, int, int, int *, ...);
diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index 41c430dc40c2..17e59306ce63 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -453,6 +453,26 @@ static struct rtas_function rtas_function_table[] __ro_after_init = {
 	},
 };
 
+/**
+ * rtas_function_token() - RTAS function token lookup.
+ * @handle: Function handle, e.g. RTAS_FN_EVENT_SCAN.
+ *
+ * Context: Any context.
+ * Return: the token value for the function if implemented by this platform,
+ *         otherwise RTAS_UNKNOWN_SERVICE.
+ */
+s32 rtas_function_token(const rtas_fn_handle_t handle)
+{
+	const size_t index = handle.index;
+	const bool out_of_bounds = index >= ARRAY_SIZE(rtas_function_table);
+
+	if (WARN_ONCE(out_of_bounds, "invalid function index %zu", index))
+		return RTAS_UNKNOWN_SERVICE;
+
+	return rtas_function_table[index].token;
+}
+EXPORT_SYMBOL_GPL(rtas_function_token);
+
 static int rtas_function_cmp(const void *a, const void *b)
 {
 	const struct rtas_function *f1 = a;
@@ -1011,7 +1031,7 @@ static int ibm_errinjct_token;
  * @....: List of @nargs input parameters.
  *
  * Invokes the RTAS function indicated by @token, which the caller
- * should obtain via rtas_token().
+ * should obtain via rtas_function_token().
  *
  * The @nargs and @nret arguments must match the number of input and
  * output parameters specified for the RTAS function.

-- 
2.39.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 19/19] powerpc/rtas: arch-wide function token lookup conversions
  2023-02-06 18:54 ` Nathan Lynch via B4 Submission Endpoint
@ 2023-02-06 18:54   ` Nathan Lynch via B4 Submission Endpoint
  -1 siblings, 0 replies; 50+ messages in thread
From: Nathan Lynch @ 2023-02-06 18:54 UTC (permalink / raw)
  To: Michael Ellerman, Nicholas Piggin, Christophe Leroy, Kajol Jain,
	Laurent Dufour, Mahesh J Salgaonkar, Andrew Donnellan,
	Nick Child
  Cc: linuxppc-dev, Nathan Lynch

With the tokens for all implemented RTAS functions now available via
rtas_function_token(), which is optimal and safe for arbitrary
contexts, there is no need to use rtas_token() or cache its result.

Most conversions are trivial, but a few are worth describing in more
detail:

* Error injection token comparisons for lockdown purposes are
  consolidated into a simple predicate: token_is_restricted_errinjct().

* A couple of special cases in block_rtas_call() do not use
  rtas_token() but perform string comparisons against names in the
  function table. These are converted to compare against token values
  instead, which is logically equivalent but less expensive.

* The lookup for the ibm,os-term token can be deferred until needed,
  instead of caching it at boot to avoid device tree traversal during
  panic.

* Since rtas_function_token() accesses a read-only data structure
  without taking any locks, xmon's lookup of set-indicator can be
  performed as needed instead of cached at startup.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
---
 arch/powerpc/kernel/rtas-proc.c               | 24 ++++----
 arch/powerpc/kernel/rtas-rtc.c                |  6 +-
 arch/powerpc/kernel/rtas.c                    | 79 +++++++++++++--------------
 arch/powerpc/kernel/rtas_flash.c              | 21 ++++---
 arch/powerpc/kernel/rtas_pci.c                |  8 +--
 arch/powerpc/kernel/rtasd.c                   |  2 +-
 arch/powerpc/platforms/52xx/efika.c           |  4 +-
 arch/powerpc/platforms/cell/ras.c             |  4 +-
 arch/powerpc/platforms/cell/smp.c             |  4 +-
 arch/powerpc/platforms/chrp/nvram.c           |  4 +-
 arch/powerpc/platforms/chrp/pci.c             |  4 +-
 arch/powerpc/platforms/chrp/setup.c           |  4 +-
 arch/powerpc/platforms/maple/setup.c          |  4 +-
 arch/powerpc/platforms/pseries/dlpar.c        |  2 +-
 arch/powerpc/platforms/pseries/eeh_pseries.c  | 22 ++++----
 arch/powerpc/platforms/pseries/hotplug-cpu.c  |  4 +-
 arch/powerpc/platforms/pseries/io_event_irq.c |  2 +-
 arch/powerpc/platforms/pseries/mobility.c     |  4 +-
 arch/powerpc/platforms/pseries/msi.c          |  4 +-
 arch/powerpc/platforms/pseries/nvram.c        |  4 +-
 arch/powerpc/platforms/pseries/pci.c          |  2 +-
 arch/powerpc/platforms/pseries/ras.c          |  2 +-
 arch/powerpc/platforms/pseries/setup.c        |  8 +--
 arch/powerpc/platforms/pseries/smp.c          |  6 +-
 arch/powerpc/sysdev/xics/ics-rtas.c           |  8 +--
 arch/powerpc/xmon/xmon.c                      | 16 +-----
 26 files changed, 119 insertions(+), 133 deletions(-)

diff --git a/arch/powerpc/kernel/rtas-proc.c b/arch/powerpc/kernel/rtas-proc.c
index 081b2b741a8c..9454b8395b6a 100644
--- a/arch/powerpc/kernel/rtas-proc.c
+++ b/arch/powerpc/kernel/rtas-proc.c
@@ -287,9 +287,9 @@ static ssize_t ppc_rtas_poweron_write(struct file *file,
 
 	rtc_time64_to_tm(nowtime, &tm);
 
-	error = rtas_call(rtas_token("set-time-for-power-on"), 7, 1, NULL, 
-			tm.tm_year + 1900, tm.tm_mon + 1, tm.tm_mday,
-			tm.tm_hour, tm.tm_min, tm.tm_sec, 0 /* nano */);
+	error = rtas_call(rtas_function_token(RTAS_FN_SET_TIME_FOR_POWER_ON), 7, 1, NULL,
+			  tm.tm_year + 1900, tm.tm_mon + 1, tm.tm_mday,
+			  tm.tm_hour, tm.tm_min, tm.tm_sec, 0 /* nano */);
 	if (error)
 		printk(KERN_WARNING "error: setting poweron time returned: %s\n", 
 				ppc_rtas_process_error(error));
@@ -350,9 +350,9 @@ static ssize_t ppc_rtas_clock_write(struct file *file,
 		return error;
 
 	rtc_time64_to_tm(nowtime, &tm);
-	error = rtas_call(rtas_token("set-time-of-day"), 7, 1, NULL, 
-			tm.tm_year + 1900, tm.tm_mon + 1, tm.tm_mday,
-			tm.tm_hour, tm.tm_min, tm.tm_sec, 0);
+	error = rtas_call(rtas_function_token(RTAS_FN_SET_TIME_OF_DAY), 7, 1, NULL,
+			  tm.tm_year + 1900, tm.tm_mon + 1, tm.tm_mday,
+			  tm.tm_hour, tm.tm_min, tm.tm_sec, 0);
 	if (error)
 		printk(KERN_WARNING "error: setting the clock returned: %s\n", 
 				ppc_rtas_process_error(error));
@@ -362,7 +362,7 @@ static ssize_t ppc_rtas_clock_write(struct file *file,
 static int ppc_rtas_clock_show(struct seq_file *m, void *v)
 {
 	int ret[8];
-	int error = rtas_call(rtas_token("get-time-of-day"), 0, 8, ret);
+	int error = rtas_call(rtas_function_token(RTAS_FN_GET_TIME_OF_DAY), 0, 8, ret);
 
 	if (error) {
 		printk(KERN_WARNING "error: reading the clock returned: %s\n", 
@@ -385,7 +385,7 @@ static int ppc_rtas_sensors_show(struct seq_file *m, void *v)
 {
 	int i,j;
 	int state, error;
-	int get_sensor_state = rtas_token("get-sensor-state");
+	int get_sensor_state = rtas_function_token(RTAS_FN_GET_SENSOR_STATE);
 
 	seq_printf(m, "RTAS (RunTime Abstraction Services) Sensor Information\n");
 	seq_printf(m, "Sensor\t\tValue\t\tCondition\tLocation\n");
@@ -708,8 +708,8 @@ static ssize_t ppc_rtas_tone_freq_write(struct file *file,
 		return error;
 
 	rtas_tone_frequency = freq; /* save it for later */
-	error = rtas_call(rtas_token("set-indicator"), 3, 1, NULL,
-			TONE_FREQUENCY, 0, freq);
+	error = rtas_call(rtas_function_token(RTAS_FN_SET_INDICATOR), 3, 1, NULL,
+			  TONE_FREQUENCY, 0, freq);
 	if (error)
 		printk(KERN_WARNING "error: setting tone frequency returned: %s\n", 
 				ppc_rtas_process_error(error));
@@ -736,8 +736,8 @@ static ssize_t ppc_rtas_tone_volume_write(struct file *file,
 		volume = 100;
 	
         rtas_tone_volume = volume; /* save it for later */
-	error = rtas_call(rtas_token("set-indicator"), 3, 1, NULL,
-			TONE_VOLUME, 0, volume);
+	error = rtas_call(rtas_function_token(RTAS_FN_SET_INDICATOR), 3, 1, NULL,
+			  TONE_VOLUME, 0, volume);
 	if (error)
 		printk(KERN_WARNING "error: setting tone volume returned: %s\n", 
 				ppc_rtas_process_error(error));
diff --git a/arch/powerpc/kernel/rtas-rtc.c b/arch/powerpc/kernel/rtas-rtc.c
index 5a31d1829bca..6996214532bd 100644
--- a/arch/powerpc/kernel/rtas-rtc.c
+++ b/arch/powerpc/kernel/rtas-rtc.c
@@ -21,7 +21,7 @@ time64_t __init rtas_get_boot_time(void)
 
 	max_wait_tb = get_tb() + tb_ticks_per_usec * 1000 * MAX_RTC_WAIT;
 	do {
-		error = rtas_call(rtas_token("get-time-of-day"), 0, 8, ret);
+		error = rtas_call(rtas_function_token(RTAS_FN_GET_TIME_OF_DAY), 0, 8, ret);
 
 		wait_time = rtas_busy_delay_time(error);
 		if (wait_time) {
@@ -53,7 +53,7 @@ void rtas_get_rtc_time(struct rtc_time *rtc_tm)
 
 	max_wait_tb = get_tb() + tb_ticks_per_usec * 1000 * MAX_RTC_WAIT;
 	do {
-		error = rtas_call(rtas_token("get-time-of-day"), 0, 8, ret);
+		error = rtas_call(rtas_function_token(RTAS_FN_GET_TIME_OF_DAY), 0, 8, ret);
 
 		wait_time = rtas_busy_delay_time(error);
 		if (wait_time) {
@@ -90,7 +90,7 @@ int rtas_set_rtc_time(struct rtc_time *tm)
 
 	max_wait_tb = get_tb() + tb_ticks_per_usec * 1000 * MAX_RTC_WAIT;
 	do {
-	        error = rtas_call(rtas_token("set-time-of-day"), 7, 1, NULL,
+		error = rtas_call(rtas_function_token(RTAS_FN_SET_TIME_OF_DAY), 7, 1, NULL,
 				  tm->tm_year + 1900, tm->tm_mon + 1,
 				  tm->tm_mday, tm->tm_hour, tm->tm_min,
 				  tm->tm_sec, 0);
diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index 17e59306ce63..833f262a2165 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -776,8 +776,8 @@ void rtas_progress(char *s, unsigned short hex)
 					"ibm,display-truncation-length", NULL);
 			of_node_put(root);
 		}
-		display_character = rtas_token("display-character");
-		set_indicator = rtas_token("set-indicator");
+		display_character = rtas_function_token(RTAS_FN_DISPLAY_CHARACTER);
+		set_indicator = rtas_function_token(RTAS_FN_SET_INDICATOR);
 	}
 
 	if (display_character == RTAS_UNKNOWN_SERVICE) {
@@ -931,7 +931,6 @@ static void __init init_error_log_max(void)
 
 
 static char rtas_err_buf[RTAS_ERROR_LOG_MAX];
-static int rtas_last_error_token;
 
 /** Return a copy of the detailed error text associated with the
  *  most recent failed call to rtas.  Because the error text
@@ -941,16 +940,17 @@ static int rtas_last_error_token;
  */
 static char *__fetch_rtas_last_error(char *altbuf)
 {
+	const s32 token = rtas_function_token(RTAS_FN_RTAS_LAST_ERROR);
 	struct rtas_args err_args, save_args;
 	u32 bufsz;
 	char *buf = NULL;
 
-	if (rtas_last_error_token == -1)
+	if (token == -1)
 		return NULL;
 
 	bufsz = rtas_get_error_log_max();
 
-	err_args.token = cpu_to_be32(rtas_last_error_token);
+	err_args.token = cpu_to_be32(token);
 	err_args.nargs = cpu_to_be32(2);
 	err_args.nret = cpu_to_be32(1);
 	err_args.args[0] = cpu_to_be32(__pa(rtas_err_buf));
@@ -1019,8 +1019,11 @@ void rtas_call_unlocked(struct rtas_args *args, int token, int nargs, int nret,
 	va_end(list);
 }
 
-static int ibm_open_errinjct_token;
-static int ibm_errinjct_token;
+static bool token_is_restricted_errinjct(s32 token)
+{
+	return token == rtas_function_token(RTAS_FN_IBM_OPEN_ERRINJCT) ||
+	       token == rtas_function_token(RTAS_FN_IBM_ERRINJCT);
+}
 
 /**
  * rtas_call() - Invoke an RTAS firmware function.
@@ -1092,7 +1095,7 @@ int rtas_call(int token, int nargs, int nret, int *outputs, ...)
 	if (!rtas.entry || token == RTAS_UNKNOWN_SERVICE)
 		return -1;
 
-	if (token == ibm_open_errinjct_token || token == ibm_errinjct_token) {
+	if (token_is_restricted_errinjct(token)) {
 		/*
 		 * It would be nicer to not discard the error value
 		 * from security_locked_down(), but callers expect an
@@ -1323,7 +1326,7 @@ static int rtas_error_rc(int rtas_rc)
 
 int rtas_get_power_level(int powerdomain, int *level)
 {
-	int token = rtas_token("get-power-level");
+	int token = rtas_function_token(RTAS_FN_GET_POWER_LEVEL);
 	int rc;
 
 	if (token == RTAS_UNKNOWN_SERVICE)
@@ -1340,7 +1343,7 @@ EXPORT_SYMBOL_GPL(rtas_get_power_level);
 
 int rtas_set_power_level(int powerdomain, int level, int *setlevel)
 {
-	int token = rtas_token("set-power-level");
+	int token = rtas_function_token(RTAS_FN_SET_POWER_LEVEL);
 	int rc;
 
 	if (token == RTAS_UNKNOWN_SERVICE)
@@ -1358,7 +1361,7 @@ EXPORT_SYMBOL_GPL(rtas_set_power_level);
 
 int rtas_get_sensor(int sensor, int index, int *state)
 {
-	int token = rtas_token("get-sensor-state");
+	int token = rtas_function_token(RTAS_FN_GET_SENSOR_STATE);
 	int rc;
 
 	if (token == RTAS_UNKNOWN_SERVICE)
@@ -1376,7 +1379,7 @@ EXPORT_SYMBOL_GPL(rtas_get_sensor);
 
 int rtas_get_sensor_fast(int sensor, int index, int *state)
 {
-	int token = rtas_token("get-sensor-state");
+	int token = rtas_function_token(RTAS_FN_GET_SENSOR_STATE);
 	int rc;
 
 	if (token == RTAS_UNKNOWN_SERVICE)
@@ -1418,7 +1421,7 @@ bool rtas_indicator_present(int token, int *maxindex)
 
 int rtas_set_indicator(int indicator, int index, int new_value)
 {
-	int token = rtas_token("set-indicator");
+	int token = rtas_function_token(RTAS_FN_SET_INDICATOR);
 	int rc;
 
 	if (token == RTAS_UNKNOWN_SERVICE)
@@ -1439,8 +1442,8 @@ EXPORT_SYMBOL_GPL(rtas_set_indicator);
  */
 int rtas_set_indicator_fast(int indicator, int index, int new_value)
 {
+	int token = rtas_function_token(RTAS_FN_SET_INDICATOR);
 	int rc;
-	int token = rtas_token("set-indicator");
 
 	if (token == RTAS_UNKNOWN_SERVICE)
 		return -ENOENT;
@@ -1482,10 +1485,11 @@ int rtas_set_indicator_fast(int indicator, int index, int new_value)
  */
 int rtas_ibm_suspend_me(int *fw_status)
 {
+	int token = rtas_function_token(RTAS_FN_IBM_SUSPEND_ME);
 	int fwrc;
 	int ret;
 
-	fwrc = rtas_call(rtas_token("ibm,suspend-me"), 0, 1, NULL);
+	fwrc = rtas_call(token, 0, 1, NULL);
 
 	switch (fwrc) {
 	case 0:
@@ -1518,7 +1522,7 @@ void __noreturn rtas_restart(char *cmd)
 	if (rtas_flash_term_hook)
 		rtas_flash_term_hook(SYS_RESTART);
 	pr_emerg("system-reboot returned %d\n",
-		 rtas_call(rtas_token("system-reboot"), 0, 1, NULL));
+		 rtas_call(rtas_function_token(RTAS_FN_SYSTEM_REBOOT), 0, 1, NULL));
 	for (;;);
 }
 
@@ -1528,7 +1532,7 @@ void rtas_power_off(void)
 		rtas_flash_term_hook(SYS_POWER_OFF);
 	/* allow power on only with power button press */
 	pr_emerg("power-off returned %d\n",
-		 rtas_call(rtas_token("power-off"), 2, 1, NULL, -1, -1));
+		 rtas_call(rtas_function_token(RTAS_FN_POWER_OFF), 2, 1, NULL, -1, -1));
 	for (;;);
 }
 
@@ -1538,16 +1542,17 @@ void __noreturn rtas_halt(void)
 		rtas_flash_term_hook(SYS_HALT);
 	/* allow power on only with power button press */
 	pr_emerg("power-off returned %d\n",
-		 rtas_call(rtas_token("power-off"), 2, 1, NULL, -1, -1));
+		 rtas_call(rtas_function_token(RTAS_FN_POWER_OFF), 2, 1, NULL, -1, -1));
 	for (;;);
 }
 
 /* Must be in the RMO region, so we place it here */
 static char rtas_os_term_buf[2048];
-static s32 ibm_os_term_token = RTAS_UNKNOWN_SERVICE;
+static bool ibm_extended_os_term;
 
 void rtas_os_term(char *str)
 {
+	s32 token = rtas_function_token(RTAS_FN_IBM_OS_TERM);
 	int status;
 
 	/*
@@ -1556,7 +1561,8 @@ void rtas_os_term(char *str)
 	 * this property may terminate the partition which we want to avoid
 	 * since it interferes with panic_timeout.
 	 */
-	if (ibm_os_term_token == RTAS_UNKNOWN_SERVICE)
+
+	if (token == RTAS_UNKNOWN_SERVICE || !ibm_extended_os_term)
 		return;
 
 	snprintf(rtas_os_term_buf, 2048, "OS panic: %s", str);
@@ -1567,8 +1573,7 @@ void rtas_os_term(char *str)
 	 * schedules.
 	 */
 	do {
-		status = rtas_call(ibm_os_term_token, 1, 1, NULL,
-				   __pa(rtas_os_term_buf));
+		status = rtas_call(token, 1, 1, NULL, __pa(rtas_os_term_buf));
 	} while (rtas_busy_delay_time(status));
 
 	if (status != 0)
@@ -1588,10 +1593,9 @@ void rtas_os_term(char *str)
  */
 void rtas_activate_firmware(void)
 {
-	int token;
+	int token = rtas_function_token(RTAS_FN_IBM_ACTIVATE_FIRMWARE);
 	int fwrc;
 
-	token = rtas_token("ibm,activate-firmware");
 	if (token == RTAS_UNKNOWN_SERVICE) {
 		pr_notice("ibm,activate-firmware method unavailable\n");
 		return;
@@ -1677,6 +1681,8 @@ static bool block_rtas_call(int token, int nargs,
 {
 	const struct rtas_function *func;
 	const struct rtas_filter *f;
+	const bool is_platform_dump = token == rtas_function_token(RTAS_FN_IBM_PLATFORM_DUMP);
+	const bool is_config_conn = token == rtas_function_token(RTAS_FN_IBM_CONFIGURE_CONNECTOR);
 	u32 base, size, end;
 
 	/*
@@ -1713,8 +1719,7 @@ static bool block_rtas_call(int token, int nargs,
 		 * Special case for ibm,platform-dump - NULL buffer
 		 * address is used to indicate end of dump processing
 		 */
-		if (!strcmp(func->name, "ibm,platform-dump") &&
-		    base == 0)
+		if (is_platform_dump && base == 0)
 			return false;
 
 		if (!in_rmo_buf(base, end))
@@ -1735,8 +1740,7 @@ static bool block_rtas_call(int token, int nargs,
 		 * Special case for ibm,configure-connector where the
 		 * address can be 0
 		 */
-		if (!strcmp(func->name, "ibm,configure-connector") &&
-		    base == 0)
+		if (is_config_conn && base == 0)
 			return false;
 
 		if (!in_rmo_buf(base, end))
@@ -1791,7 +1795,7 @@ SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs)
 	if (block_rtas_call(token, nargs, &args))
 		return -EINVAL;
 
-	if (token == ibm_open_errinjct_token || token == ibm_errinjct_token) {
+	if (token_is_restricted_errinjct(token)) {
 		int err;
 
 		err = security_locked_down(LOCKDOWN_RTAS_ERROR_INJECTION);
@@ -1800,7 +1804,7 @@ SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs)
 	}
 
 	/* Need to handle ibm,suspend_me call specially */
-	if (token == rtas_token("ibm,suspend-me")) {
+	if (token == rtas_function_token(RTAS_FN_IBM_SUSPEND_ME)) {
 
 		/*
 		 * rtas_ibm_suspend_me assumes the streamid handle is in cpu
@@ -1935,11 +1939,10 @@ void __init rtas_initialize(void)
 	rtas_function_table_init();
 
 	/*
-	 * Discover these now to avoid device tree lookups in the
+	 * Discover this now to avoid a device tree lookup in the
 	 * panic path.
 	 */
-	if (of_property_read_bool(rtas.dev, "ibm,extended-os-term"))
-		ibm_os_term_token = rtas_token("ibm,os-term");
+	ibm_extended_os_term = of_property_read_bool(rtas.dev, "ibm,extended-os-term");
 
 	/* If RTAS was found, allocate the RMO buffer for it and look for
 	 * the stop-self token if any
@@ -1954,12 +1957,6 @@ void __init rtas_initialize(void)
 		panic("ERROR: RTAS: Failed to allocate %lx bytes below %pa\n",
 		      PAGE_SIZE, &rtas_region);
 
-#ifdef CONFIG_RTAS_ERROR_LOGGING
-	rtas_last_error_token = rtas_token("rtas-last-error");
-#endif
-	ibm_open_errinjct_token = rtas_token("ibm,open-errinjct");
-	ibm_errinjct_token = rtas_token("ibm,errinjct");
-
 	rtas_work_area_reserve_arena(rtas_region);
 }
 
@@ -2015,13 +2012,13 @@ void rtas_give_timebase(void)
 
 	raw_spin_lock_irqsave(&timebase_lock, flags);
 	hard_irq_disable();
-	rtas_call(rtas_token("freeze-time-base"), 0, 1, NULL);
+	rtas_call(rtas_function_token(RTAS_FN_FREEZE_TIME_BASE), 0, 1, NULL);
 	timebase = get_tb();
 	raw_spin_unlock(&timebase_lock);
 
 	while (timebase)
 		barrier();
-	rtas_call(rtas_token("thaw-time-base"), 0, 1, NULL);
+	rtas_call(rtas_function_token(RTAS_FN_THAW_TIME_BASE), 0, 1, NULL);
 	local_irq_restore(flags);
 }
 
diff --git a/arch/powerpc/kernel/rtas_flash.c b/arch/powerpc/kernel/rtas_flash.c
index bc817a5619d6..4caf5e3079eb 100644
--- a/arch/powerpc/kernel/rtas_flash.c
+++ b/arch/powerpc/kernel/rtas_flash.c
@@ -376,7 +376,7 @@ static void manage_flash(struct rtas_manage_flash_t *args_buf, unsigned int op)
 	s32 rc;
 
 	do {
-		rc = rtas_call(rtas_token("ibm,manage-flash-image"), 1, 1,
+		rc = rtas_call(rtas_function_token(RTAS_FN_IBM_MANAGE_FLASH_IMAGE), 1, 1,
 			       NULL, op);
 	} while (rtas_busy_delay(rc));
 
@@ -444,7 +444,7 @@ static ssize_t manage_flash_write(struct file *file, const char __user *buf,
  */
 static void validate_flash(struct rtas_validate_flash_t *args_buf)
 {
-	int token = rtas_token("ibm,validate-flash-image");
+	int token = rtas_function_token(RTAS_FN_IBM_VALIDATE_FLASH_IMAGE);
 	int update_results;
 	s32 rc;	
 
@@ -570,7 +570,7 @@ static void rtas_flash_firmware(int reboot_type)
 		return;
 	}
 
-	update_token = rtas_token("ibm,update-flash-64-and-reboot");
+	update_token = rtas_function_token(RTAS_FN_IBM_UPDATE_FLASH_64_AND_REBOOT);
 	if (update_token == RTAS_UNKNOWN_SERVICE) {
 		printk(KERN_ALERT "FLASH: ibm,update-flash-64-and-reboot "
 		       "is not available -- not a service partition?\n");
@@ -653,7 +653,7 @@ static void rtas_flash_firmware(int reboot_type)
  */
 struct rtas_flash_file {
 	const char *filename;
-	const char *rtas_call_name;
+	const rtas_fn_handle_t handle;
 	int *status;
 	const struct proc_ops ops;
 };
@@ -661,7 +661,7 @@ struct rtas_flash_file {
 static const struct rtas_flash_file rtas_flash_files[] = {
 	{
 		.filename	= "powerpc/rtas/" FIRMWARE_FLASH_NAME,
-		.rtas_call_name	= "ibm,update-flash-64-and-reboot",
+		.handle		= RTAS_FN_IBM_UPDATE_FLASH_64_AND_REBOOT,
 		.status		= &rtas_update_flash_data.status,
 		.ops.proc_read	= rtas_flash_read_msg,
 		.ops.proc_write	= rtas_flash_write,
@@ -670,7 +670,7 @@ static const struct rtas_flash_file rtas_flash_files[] = {
 	},
 	{
 		.filename	= "powerpc/rtas/" FIRMWARE_UPDATE_NAME,
-		.rtas_call_name	= "ibm,update-flash-64-and-reboot",
+		.handle		= RTAS_FN_IBM_UPDATE_FLASH_64_AND_REBOOT,
 		.status		= &rtas_update_flash_data.status,
 		.ops.proc_read	= rtas_flash_read_num,
 		.ops.proc_write	= rtas_flash_write,
@@ -679,7 +679,7 @@ static const struct rtas_flash_file rtas_flash_files[] = {
 	},
 	{
 		.filename	= "powerpc/rtas/" VALIDATE_FLASH_NAME,
-		.rtas_call_name	= "ibm,validate-flash-image",
+		.handle		= RTAS_FN_IBM_VALIDATE_FLASH_IMAGE,
 		.status		= &rtas_validate_flash_data.status,
 		.ops.proc_read	= validate_flash_read,
 		.ops.proc_write	= validate_flash_write,
@@ -688,7 +688,7 @@ static const struct rtas_flash_file rtas_flash_files[] = {
 	},
 	{
 		.filename	= "powerpc/rtas/" MANAGE_FLASH_NAME,
-		.rtas_call_name	= "ibm,manage-flash-image",
+		.handle		= RTAS_FN_IBM_MANAGE_FLASH_IMAGE,
 		.status		= &rtas_manage_flash_data.status,
 		.ops.proc_read	= manage_flash_read,
 		.ops.proc_write	= manage_flash_write,
@@ -700,8 +700,7 @@ static int __init rtas_flash_init(void)
 {
 	int i;
 
-	if (rtas_token("ibm,update-flash-64-and-reboot") ==
-		       RTAS_UNKNOWN_SERVICE) {
+	if (rtas_function_token(RTAS_FN_IBM_UPDATE_FLASH_64_AND_REBOOT) == RTAS_UNKNOWN_SERVICE) {
 		pr_info("rtas_flash: no firmware flash support\n");
 		return -EINVAL;
 	}
@@ -730,7 +729,7 @@ static int __init rtas_flash_init(void)
 		 * This code assumes that the status int is the first member of the
 		 * struct
 		 */
-		token = rtas_token(f->rtas_call_name);
+		token = rtas_function_token(f->handle);
 		if (token == RTAS_UNKNOWN_SERVICE)
 			*f->status = FLASH_AUTH;
 		else
diff --git a/arch/powerpc/kernel/rtas_pci.c b/arch/powerpc/kernel/rtas_pci.c
index 5a2f5ea3b054..e1fdc7473b72 100644
--- a/arch/powerpc/kernel/rtas_pci.c
+++ b/arch/powerpc/kernel/rtas_pci.c
@@ -191,10 +191,10 @@ static void python_countermeasures(struct device_node *dev)
 
 void __init init_pci_config_tokens(void)
 {
-	read_pci_config = rtas_token("read-pci-config");
-	write_pci_config = rtas_token("write-pci-config");
-	ibm_read_pci_config = rtas_token("ibm,read-pci-config");
-	ibm_write_pci_config = rtas_token("ibm,write-pci-config");
+	read_pci_config = rtas_function_token(RTAS_FN_READ_PCI_CONFIG);
+	write_pci_config = rtas_function_token(RTAS_FN_WRITE_PCI_CONFIG);
+	ibm_read_pci_config = rtas_function_token(RTAS_FN_IBM_READ_PCI_CONFIG);
+	ibm_write_pci_config = rtas_function_token(RTAS_FN_IBM_WRITE_PCI_CONFIG);
 }
 
 unsigned long get_phb_buid(struct device_node *phb)
diff --git a/arch/powerpc/kernel/rtasd.c b/arch/powerpc/kernel/rtasd.c
index cc56ac6ba4b0..9bba469239fc 100644
--- a/arch/powerpc/kernel/rtasd.c
+++ b/arch/powerpc/kernel/rtasd.c
@@ -506,7 +506,7 @@ static int __init rtas_event_scan_init(void)
 		return 0;
 
 	/* No RTAS */
-	event_scan = rtas_token("event-scan");
+	event_scan = rtas_function_token(RTAS_FN_EVENT_SCAN);
 	if (event_scan == RTAS_UNKNOWN_SERVICE) {
 		printk(KERN_INFO "rtasd: No event-scan on system\n");
 		return -ENODEV;
diff --git a/arch/powerpc/platforms/52xx/efika.c b/arch/powerpc/platforms/52xx/efika.c
index e0647720ed5e..61dfec74ff85 100644
--- a/arch/powerpc/platforms/52xx/efika.c
+++ b/arch/powerpc/platforms/52xx/efika.c
@@ -41,7 +41,7 @@ static int rtas_read_config(struct pci_bus *bus, unsigned int devfn, int offset,
 	int ret = -1;
 	int rval;
 
-	rval = rtas_call(rtas_token("read-pci-config"), 2, 2, &ret, addr, len);
+	rval = rtas_call(rtas_function_token(RTAS_FN_READ_PCI_CONFIG), 2, 2, &ret, addr, len);
 	*val = ret;
 	return rval ? PCIBIOS_DEVICE_NOT_FOUND : PCIBIOS_SUCCESSFUL;
 }
@@ -55,7 +55,7 @@ static int rtas_write_config(struct pci_bus *bus, unsigned int devfn,
 	    | (hose->global_number << 24);
 	int rval;
 
-	rval = rtas_call(rtas_token("write-pci-config"), 3, 1, NULL,
+	rval = rtas_call(rtas_function_token(RTAS_FN_WRITE_PCI_CONFIG), 3, 1, NULL,
 			 addr, len, val);
 	return rval ? PCIBIOS_DEVICE_NOT_FOUND : PCIBIOS_SUCCESSFUL;
 }
diff --git a/arch/powerpc/platforms/cell/ras.c b/arch/powerpc/platforms/cell/ras.c
index 8d934ea6270c..98db63b72d56 100644
--- a/arch/powerpc/platforms/cell/ras.c
+++ b/arch/powerpc/platforms/cell/ras.c
@@ -297,8 +297,8 @@ int cbe_sysreset_hack(void)
 static int __init cbe_ptcal_init(void)
 {
 	int ret;
-	ptcal_start_tok = rtas_token("ibm,cbe-start-ptcal");
-	ptcal_stop_tok = rtas_token("ibm,cbe-stop-ptcal");
+	ptcal_start_tok = rtas_function_token(RTAS_FN_IBM_CBE_START_PTCAL);
+	ptcal_stop_tok = rtas_function_token(RTAS_FN_IBM_CBE_STOP_PTCAL);
 
 	if (ptcal_start_tok == RTAS_UNKNOWN_SERVICE
 			|| ptcal_stop_tok == RTAS_UNKNOWN_SERVICE)
diff --git a/arch/powerpc/platforms/cell/smp.c b/arch/powerpc/platforms/cell/smp.c
index 31ce00b52a32..30394c6f8894 100644
--- a/arch/powerpc/platforms/cell/smp.c
+++ b/arch/powerpc/platforms/cell/smp.c
@@ -81,7 +81,7 @@ static inline int smp_startup_cpu(unsigned int lcpu)
 	 * If the RTAS start-cpu token does not exist then presume the
 	 * cpu is already spinning.
 	 */
-	start_cpu = rtas_token("start-cpu");
+	start_cpu = rtas_function_token(RTAS_FN_START_CPU);
 	if (start_cpu == RTAS_UNKNOWN_SERVICE)
 		return 1;
 
@@ -152,7 +152,7 @@ void __init smp_init_cell(void)
 	cpumask_clear_cpu(boot_cpuid, &of_spin_map);
 
 	/* Non-lpar has additional take/give timebase */
-	if (rtas_token("freeze-time-base") != RTAS_UNKNOWN_SERVICE) {
+	if (rtas_function_token(RTAS_FN_FREEZE_TIME_BASE) != RTAS_UNKNOWN_SERVICE) {
 		smp_ops->give_timebase = rtas_give_timebase;
 		smp_ops->take_timebase = rtas_take_timebase;
 	}
diff --git a/arch/powerpc/platforms/chrp/nvram.c b/arch/powerpc/platforms/chrp/nvram.c
index dab78076fedb..0eedae96498c 100644
--- a/arch/powerpc/platforms/chrp/nvram.c
+++ b/arch/powerpc/platforms/chrp/nvram.c
@@ -31,7 +31,7 @@ static unsigned char chrp_nvram_read_val(int addr)
 		return 0xff;
 	}
 	spin_lock_irqsave(&nvram_lock, flags);
-	if ((rtas_call(rtas_token("nvram-fetch"), 3, 2, &done, addr,
+	if ((rtas_call(rtas_function_token(RTAS_FN_NVRAM_FETCH), 3, 2, &done, addr,
 		       __pa(nvram_buf), 1) != 0) || 1 != done)
 		ret = 0xff;
 	else
@@ -53,7 +53,7 @@ static void chrp_nvram_write_val(int addr, unsigned char val)
 	}
 	spin_lock_irqsave(&nvram_lock, flags);
 	nvram_buf[0] = val;
-	if ((rtas_call(rtas_token("nvram-store"), 3, 2, &done, addr,
+	if ((rtas_call(rtas_function_token(RTAS_FN_NVRAM_STORE), 3, 2, &done, addr,
 		       __pa(nvram_buf), 1) != 0) || 1 != done)
 		printk(KERN_DEBUG "rtas IO error storing 0x%02x at %d", val, addr);
 	spin_unlock_irqrestore(&nvram_lock, flags);
diff --git a/arch/powerpc/platforms/chrp/pci.c b/arch/powerpc/platforms/chrp/pci.c
index 6f6598e771ff..428fd2a7b3ee 100644
--- a/arch/powerpc/platforms/chrp/pci.c
+++ b/arch/powerpc/platforms/chrp/pci.c
@@ -104,7 +104,7 @@ static int rtas_read_config(struct pci_bus *bus, unsigned int devfn, int offset,
         int ret = -1;
 	int rval;
 
-	rval = rtas_call(rtas_token("read-pci-config"), 2, 2, &ret, addr, len);
+	rval = rtas_call(rtas_function_token(RTAS_FN_READ_PCI_CONFIG), 2, 2, &ret, addr, len);
 	*val = ret;
 	return rval? PCIBIOS_DEVICE_NOT_FOUND: PCIBIOS_SUCCESSFUL;
 }
@@ -118,7 +118,7 @@ static int rtas_write_config(struct pci_bus *bus, unsigned int devfn, int offset
 		| (hose->global_number << 24);
 	int rval;
 
-	rval = rtas_call(rtas_token("write-pci-config"), 3, 1, NULL,
+	rval = rtas_call(rtas_function_token(RTAS_FN_WRITE_PCI_CONFIG), 3, 1, NULL,
 			 addr, len, val);
 	return rval? PCIBIOS_DEVICE_NOT_FOUND: PCIBIOS_SUCCESSFUL;
 }
diff --git a/arch/powerpc/platforms/chrp/setup.c b/arch/powerpc/platforms/chrp/setup.c
index ec63c0558db6..d9049ceb1046 100644
--- a/arch/powerpc/platforms/chrp/setup.c
+++ b/arch/powerpc/platforms/chrp/setup.c
@@ -323,11 +323,11 @@ static void __init chrp_setup_arch(void)
 	printk("chrp type = %x [%s]\n", _chrp_type, chrp_names[_chrp_type]);
 
 	rtas_initialize();
-	if (rtas_token("display-character") >= 0)
+	if (rtas_function_token(RTAS_FN_DISPLAY_CHARACTER) >= 0)
 		ppc_md.progress = rtas_progress;
 
 	/* use RTAS time-of-day routines if available */
-	if (rtas_token("get-time-of-day") != RTAS_UNKNOWN_SERVICE) {
+	if (rtas_function_token(RTAS_FN_GET_TIME_OF_DAY) != RTAS_UNKNOWN_SERVICE) {
 		ppc_md.get_boot_time	= rtas_get_boot_time;
 		ppc_md.get_rtc_time	= rtas_get_rtc_time;
 		ppc_md.set_rtc_time	= rtas_set_rtc_time;
diff --git a/arch/powerpc/platforms/maple/setup.c b/arch/powerpc/platforms/maple/setup.c
index c26c379e1cc8..98c8e3603064 100644
--- a/arch/powerpc/platforms/maple/setup.c
+++ b/arch/powerpc/platforms/maple/setup.c
@@ -162,8 +162,8 @@ static struct smp_ops_t maple_smp_ops = {
 
 static void __init maple_use_rtas_reboot_and_halt_if_present(void)
 {
-	if (rtas_service_present("system-reboot") &&
-	    rtas_service_present("power-off")) {
+	if (rtas_function_implemented(RTAS_FN_SYSTEM_REBOOT) &&
+	    rtas_function_implemented(RTAS_FN_POWER_OFF)) {
 		ppc_md.restart = rtas_restart;
 		pm_power_off = rtas_power_off;
 		ppc_md.halt = rtas_halt;
diff --git a/arch/powerpc/platforms/pseries/dlpar.c b/arch/powerpc/platforms/pseries/dlpar.c
index 9b65b50a5456..75ffdbcd2865 100644
--- a/arch/powerpc/platforms/pseries/dlpar.c
+++ b/arch/powerpc/platforms/pseries/dlpar.c
@@ -143,7 +143,7 @@ struct device_node *dlpar_configure_connector(__be32 drc_index,
 	int cc_token;
 	int rc = -1;
 
-	cc_token = rtas_token("ibm,configure-connector");
+	cc_token = rtas_function_token(RTAS_FN_IBM_CONFIGURE_CONNECTOR);
 	if (cc_token == RTAS_UNKNOWN_SERVICE)
 		return NULL;
 
diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c b/arch/powerpc/platforms/pseries/eeh_pseries.c
index 6b507b62ce8f..def184da51cf 100644
--- a/arch/powerpc/platforms/pseries/eeh_pseries.c
+++ b/arch/powerpc/platforms/pseries/eeh_pseries.c
@@ -699,7 +699,7 @@ static int pseries_eeh_write_config(struct eeh_dev *edev, int where, int size, u
 static int pseries_send_allow_unfreeze(struct pci_dn *pdn, u16 *vf_pe_array, int cur_vfs)
 {
 	int rc;
-	int ibm_allow_unfreeze = rtas_token("ibm,open-sriov-allow-unfreeze");
+	int ibm_allow_unfreeze = rtas_function_token(RTAS_FN_IBM_OPEN_SRIOV_ALLOW_UNFREEZE);
 	unsigned long buid, addr;
 
 	addr = rtas_config_addr(pdn->busno, pdn->devfn, 0);
@@ -774,7 +774,7 @@ static int pseries_notify_resume(struct eeh_dev *edev)
 	if (!edev)
 		return -EEXIST;
 
-	if (rtas_token("ibm,open-sriov-allow-unfreeze") == RTAS_UNKNOWN_SERVICE)
+	if (rtas_function_token(RTAS_FN_IBM_OPEN_SRIOV_ALLOW_UNFREEZE) == RTAS_UNKNOWN_SERVICE)
 		return -EINVAL;
 
 	if (edev->pdev->is_physfn || edev->pdev->is_virtfn)
@@ -815,14 +815,14 @@ static int __init eeh_pseries_init(void)
 	int ret, config_addr;
 
 	/* figure out EEH RTAS function call tokens */
-	ibm_set_eeh_option		= rtas_token("ibm,set-eeh-option");
-	ibm_set_slot_reset		= rtas_token("ibm,set-slot-reset");
-	ibm_read_slot_reset_state2	= rtas_token("ibm,read-slot-reset-state2");
-	ibm_read_slot_reset_state	= rtas_token("ibm,read-slot-reset-state");
-	ibm_slot_error_detail		= rtas_token("ibm,slot-error-detail");
-	ibm_get_config_addr_info2	= rtas_token("ibm,get-config-addr-info2");
-	ibm_get_config_addr_info	= rtas_token("ibm,get-config-addr-info");
-	ibm_configure_pe		= rtas_token("ibm,configure-pe");
+	ibm_set_eeh_option		= rtas_function_token(RTAS_FN_IBM_SET_EEH_OPTION);
+	ibm_set_slot_reset		= rtas_function_token(RTAS_FN_IBM_SET_SLOT_RESET);
+	ibm_read_slot_reset_state2	= rtas_function_token(RTAS_FN_IBM_READ_SLOT_RESET_STATE2);
+	ibm_read_slot_reset_state	= rtas_function_token(RTAS_FN_IBM_READ_SLOT_RESET_STATE);
+	ibm_slot_error_detail		= rtas_function_token(RTAS_FN_IBM_SLOT_ERROR_DETAIL);
+	ibm_get_config_addr_info2	= rtas_function_token(RTAS_FN_IBM_GET_CONFIG_ADDR_INFO2);
+	ibm_get_config_addr_info	= rtas_function_token(RTAS_FN_IBM_GET_CONFIG_ADDR_INFO);
+	ibm_configure_pe		= rtas_function_token(RTAS_FN_IBM_CONFIGURE_PE);
 
 	/*
 	 * ibm,configure-pe and ibm,configure-bridge have the same semantics,
@@ -830,7 +830,7 @@ static int __init eeh_pseries_init(void)
 	 * ibm,configure-pe then fall back to using ibm,configure-bridge.
 	 */
 	if (ibm_configure_pe == RTAS_UNKNOWN_SERVICE)
-		ibm_configure_pe	= rtas_token("ibm,configure-bridge");
+		ibm_configure_pe	= rtas_function_token(RTAS_FN_IBM_CONFIGURE_BRIDGE);
 
 	/*
 	 * Necessary sanity check. We needn't check "get-config-addr-info"
diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c
index 090ae5a1e0f5..982e5e4b5e06 100644
--- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -855,8 +855,8 @@ static int __init pseries_cpu_hotplug_init(void)
 	ppc_md.cpu_release = dlpar_cpu_release;
 #endif /* CONFIG_ARCH_CPU_PROBE_RELEASE */
 
-	rtas_stop_self_token = rtas_token("stop-self");
-	qcss_tok = rtas_token("query-cpu-stopped-state");
+	rtas_stop_self_token = rtas_function_token(RTAS_FN_STOP_SELF);
+	qcss_tok = rtas_function_token(RTAS_FN_QUERY_CPU_STOPPED_STATE);
 
 	if (rtas_stop_self_token == RTAS_UNKNOWN_SERVICE ||
 			qcss_tok == RTAS_UNKNOWN_SERVICE) {
diff --git a/arch/powerpc/platforms/pseries/io_event_irq.c b/arch/powerpc/platforms/pseries/io_event_irq.c
index 7b74d4d34e9a..f411d4fe7b24 100644
--- a/arch/powerpc/platforms/pseries/io_event_irq.c
+++ b/arch/powerpc/platforms/pseries/io_event_irq.c
@@ -143,7 +143,7 @@ static int __init ioei_init(void)
 {
 	struct device_node *np;
 
-	ioei_check_exception_token = rtas_token("check-exception");
+	ioei_check_exception_token = rtas_function_token(RTAS_FN_CHECK_EXCEPTION);
 	if (ioei_check_exception_token == RTAS_UNKNOWN_SERVICE)
 		return -ENODEV;
 
diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
index 4cea71aa0f41..643d309d1bd0 100644
--- a/arch/powerpc/platforms/pseries/mobility.c
+++ b/arch/powerpc/platforms/pseries/mobility.c
@@ -195,7 +195,7 @@ static int update_dt_node(struct device_node *dn, s32 scope)
 	u32 nprops;
 	u32 vd;
 
-	update_properties_token = rtas_token("ibm,update-properties");
+	update_properties_token = rtas_function_token(RTAS_FN_IBM_UPDATE_PROPERTIES);
 	if (update_properties_token == RTAS_UNKNOWN_SERVICE)
 		return -EINVAL;
 
@@ -306,7 +306,7 @@ static int pseries_devicetree_update(s32 scope)
 	int update_nodes_token;
 	int rc;
 
-	update_nodes_token = rtas_token("ibm,update-nodes");
+	update_nodes_token = rtas_function_token(RTAS_FN_IBM_UPDATE_NODES);
 	if (update_nodes_token == RTAS_UNKNOWN_SERVICE)
 		return 0;
 
diff --git a/arch/powerpc/platforms/pseries/msi.c b/arch/powerpc/platforms/pseries/msi.c
index 3f05507e444d..423ee1d5bd94 100644
--- a/arch/powerpc/platforms/pseries/msi.c
+++ b/arch/powerpc/platforms/pseries/msi.c
@@ -679,8 +679,8 @@ static void rtas_msi_pci_irq_fixup(struct pci_dev *pdev)
 
 static int rtas_msi_init(void)
 {
-	query_token  = rtas_token("ibm,query-interrupt-source-number");
-	change_token = rtas_token("ibm,change-msi");
+	query_token  = rtas_function_token(RTAS_FN_IBM_QUERY_INTERRUPT_SOURCE_NUMBER);
+	change_token = rtas_function_token(RTAS_FN_IBM_CHANGE_MSI);
 
 	if ((query_token == RTAS_UNKNOWN_SERVICE) ||
 			(change_token == RTAS_UNKNOWN_SERVICE)) {
diff --git a/arch/powerpc/platforms/pseries/nvram.c b/arch/powerpc/platforms/pseries/nvram.c
index cbf1720eb4aa..8130c37962c0 100644
--- a/arch/powerpc/platforms/pseries/nvram.c
+++ b/arch/powerpc/platforms/pseries/nvram.c
@@ -227,8 +227,8 @@ int __init pSeries_nvram_init(void)
 
 	nvram_size = be32_to_cpup(nbytes_p);
 
-	nvram_fetch = rtas_token("nvram-fetch");
-	nvram_store = rtas_token("nvram-store");
+	nvram_fetch = rtas_function_token(RTAS_FN_NVRAM_FETCH);
+	nvram_store = rtas_function_token(RTAS_FN_NVRAM_STORE);
 	printk(KERN_INFO "PPC64 nvram contains %d bytes\n", nvram_size);
 	of_node_put(nvram);
 
diff --git a/arch/powerpc/platforms/pseries/pci.c b/arch/powerpc/platforms/pseries/pci.c
index 6e671c3809ec..60e0a58928ef 100644
--- a/arch/powerpc/platforms/pseries/pci.c
+++ b/arch/powerpc/platforms/pseries/pci.c
@@ -60,7 +60,7 @@ static int pseries_send_map_pe(struct pci_dev *pdev, u16 num_vfs,
 	struct pci_dn *pdn;
 	int rc;
 	unsigned long buid, addr;
-	int ibm_map_pes = rtas_token("ibm,open-sriov-map-pe-number");
+	int ibm_map_pes = rtas_function_token(RTAS_FN_IBM_OPEN_SRIOV_MAP_PE_NUMBER);
 
 	if (ibm_map_pes == RTAS_UNKNOWN_SERVICE)
 		return -EINVAL;
diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c
index f12516c3998c..adafd593d9d3 100644
--- a/arch/powerpc/platforms/pseries/ras.c
+++ b/arch/powerpc/platforms/pseries/ras.c
@@ -155,7 +155,7 @@ static int __init init_ras_IRQ(void)
 {
 	struct device_node *np;
 
-	ras_check_exception_token = rtas_token("check-exception");
+	ras_check_exception_token = rtas_function_token(RTAS_FN_CHECK_EXCEPTION);
 
 	/* Internal Errors */
 	np = of_find_node_by_path("/event-sources/internal-errors");
diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c
index 420a2fa48292..4a0cec8cf623 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -136,11 +136,11 @@ static void __init fwnmi_init(void)
 #endif
 	int ibm_nmi_register_token;
 
-	ibm_nmi_register_token = rtas_token("ibm,nmi-register");
+	ibm_nmi_register_token = rtas_function_token(RTAS_FN_IBM_NMI_REGISTER);
 	if (ibm_nmi_register_token == RTAS_UNKNOWN_SERVICE)
 		return;
 
-	ibm_nmi_interlock_token = rtas_token("ibm,nmi-interlock");
+	ibm_nmi_interlock_token = rtas_function_token(RTAS_FN_IBM_NMI_INTERLOCK);
 	if (WARN_ON(ibm_nmi_interlock_token == RTAS_UNKNOWN_SERVICE))
 		return;
 
@@ -1071,14 +1071,14 @@ static void __init pseries_init(void)
 static void pseries_power_off(void)
 {
 	int rc;
-	int rtas_poweroff_ups_token = rtas_token("ibm,power-off-ups");
+	int rtas_poweroff_ups_token = rtas_function_token(RTAS_FN_IBM_POWER_OFF_UPS);
 
 	if (rtas_flash_term_hook)
 		rtas_flash_term_hook(SYS_POWER_OFF);
 
 	if (rtas_poweron_auto == 0 ||
 		rtas_poweroff_ups_token == RTAS_UNKNOWN_SERVICE) {
-		rc = rtas_call(rtas_token("power-off"), 2, 1, NULL, -1, -1);
+		rc = rtas_call(rtas_function_token(RTAS_FN_POWER_OFF), 2, 1, NULL, -1, -1);
 		printk(KERN_INFO "RTAS power-off returned %d\n", rc);
 	} else {
 		rc = rtas_call(rtas_poweroff_ups_token, 0, 1, NULL);
diff --git a/arch/powerpc/platforms/pseries/smp.c b/arch/powerpc/platforms/pseries/smp.c
index 2bcfee86ff87..c597711ef20a 100644
--- a/arch/powerpc/platforms/pseries/smp.c
+++ b/arch/powerpc/platforms/pseries/smp.c
@@ -55,7 +55,7 @@ static cpumask_var_t of_spin_mask;
 int smp_query_cpu_stopped(unsigned int pcpu)
 {
 	int cpu_status, status;
-	int qcss_tok = rtas_token("query-cpu-stopped-state");
+	int qcss_tok = rtas_function_token(RTAS_FN_QUERY_CPU_STOPPED_STATE);
 
 	if (qcss_tok == RTAS_UNKNOWN_SERVICE) {
 		printk_once(KERN_INFO
@@ -108,7 +108,7 @@ static inline int smp_startup_cpu(unsigned int lcpu)
 	 * If the RTAS start-cpu token does not exist then presume the
 	 * cpu is already spinning.
 	 */
-	start_cpu = rtas_token("start-cpu");
+	start_cpu = rtas_function_token(RTAS_FN_START_CPU);
 	if (start_cpu == RTAS_UNKNOWN_SERVICE)
 		return 1;
 
@@ -266,7 +266,7 @@ void __init smp_init_pseries(void)
 	 * We know prom_init will not have started them if RTAS supports
 	 * query-cpu-stopped-state.
 	 */
-	if (rtas_token("query-cpu-stopped-state") == RTAS_UNKNOWN_SERVICE) {
+	if (rtas_function_token(RTAS_FN_QUERY_CPU_STOPPED_STATE) == RTAS_UNKNOWN_SERVICE) {
 		if (cpu_has_feature(CPU_FTR_SMT)) {
 			for_each_present_cpu(i) {
 				if (cpu_thread_in_core(i) == 0)
diff --git a/arch/powerpc/sysdev/xics/ics-rtas.c b/arch/powerpc/sysdev/xics/ics-rtas.c
index f8320f8e5bc7..b772a833d9b7 100644
--- a/arch/powerpc/sysdev/xics/ics-rtas.c
+++ b/arch/powerpc/sysdev/xics/ics-rtas.c
@@ -200,10 +200,10 @@ static struct ics ics_rtas = {
 
 __init int ics_rtas_init(void)
 {
-	ibm_get_xive = rtas_token("ibm,get-xive");
-	ibm_set_xive = rtas_token("ibm,set-xive");
-	ibm_int_on  = rtas_token("ibm,int-on");
-	ibm_int_off = rtas_token("ibm,int-off");
+	ibm_get_xive = rtas_function_token(RTAS_FN_IBM_GET_XIVE);
+	ibm_set_xive = rtas_function_token(RTAS_FN_IBM_SET_XIVE);
+	ibm_int_on  = rtas_function_token(RTAS_FN_IBM_INT_ON);
+	ibm_int_off = rtas_function_token(RTAS_FN_IBM_INT_OFF);
 
 	/* We enable the RTAS "ICS" if RTAS is present with the
 	 * appropriate tokens
diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index 0da66bc4823d..73c620c2a3a1 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -76,9 +76,6 @@ static cpumask_t xmon_batch_cpus = CPU_MASK_NONE;
 #define xmon_owner 0
 #endif /* CONFIG_SMP */
 
-#ifdef CONFIG_PPC_PSERIES
-static int set_indicator_token = RTAS_UNKNOWN_SERVICE;
-#endif
 static unsigned long in_xmon __read_mostly = 0;
 static int xmon_on = IS_ENABLED(CONFIG_XMON_DEFAULT);
 static bool xmon_is_ro = IS_ENABLED(CONFIG_XMON_DEFAULT_RO_MODE);
@@ -398,6 +395,7 @@ static inline void disable_surveillance(void)
 #ifdef CONFIG_PPC_PSERIES
 	/* Since this can't be a module, args should end up below 4GB. */
 	static struct rtas_args args;
+	const s32 token = rtas_function_token(RTAS_FN_SET_INDICATOR);
 
 	/*
 	 * At this point we have got all the cpus we can into
@@ -406,10 +404,10 @@ static inline void disable_surveillance(void)
 	 * If we did try to take rtas.lock there would be a
 	 * real possibility of deadlock.
 	 */
-	if (set_indicator_token == RTAS_UNKNOWN_SERVICE)
+	if (token == RTAS_UNKNOWN_SERVICE)
 		return;
 
-	rtas_call_unlocked(&args, set_indicator_token, 3, 1, NULL,
+	rtas_call_unlocked(&args, token, 3, 1, NULL,
 			   SURVEILLANCE_TOKEN, 0, 0);
 
 #endif /* CONFIG_PPC_PSERIES */
@@ -3976,14 +3974,6 @@ static void xmon_init(int enable)
 		__debugger_iabr_match = xmon_iabr_match;
 		__debugger_break_match = xmon_break_match;
 		__debugger_fault_handler = xmon_fault_handler;
-
-#ifdef CONFIG_PPC_PSERIES
-		/*
-		 * Get the token here to avoid trying to get a lock
-		 * during the crash, causing a deadlock.
-		 */
-		set_indicator_token = rtas_token("set-indicator");
-#endif
 	} else {
 		__debugger = NULL;
 		__debugger_ipi = NULL;

-- 
2.39.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 19/19] powerpc/rtas: arch-wide function token lookup conversions
@ 2023-02-06 18:54   ` Nathan Lynch via B4 Submission Endpoint
  0 siblings, 0 replies; 50+ messages in thread
From: Nathan Lynch via B4 Submission Endpoint @ 2023-02-06 18:54 UTC (permalink / raw)
  To: Michael Ellerman, Nicholas Piggin, Christophe Leroy, Kajol Jain,
	Laurent Dufour, Mahesh J Salgaonkar, Andrew Donnellan,
	Nick Child
  Cc: Nathan Lynch, linuxppc-dev

From: Nathan Lynch <nathanl@linux.ibm.com>

With the tokens for all implemented RTAS functions now available via
rtas_function_token(), which is optimal and safe for arbitrary
contexts, there is no need to use rtas_token() or cache its result.

Most conversions are trivial, but a few are worth describing in more
detail:

* Error injection token comparisons for lockdown purposes are
  consolidated into a simple predicate: token_is_restricted_errinjct().

* A couple of special cases in block_rtas_call() do not use
  rtas_token() but perform string comparisons against names in the
  function table. These are converted to compare against token values
  instead, which is logically equivalent but less expensive.

* The lookup for the ibm,os-term token can be deferred until needed,
  instead of caching it at boot to avoid device tree traversal during
  panic.

* Since rtas_function_token() accesses a read-only data structure
  without taking any locks, xmon's lookup of set-indicator can be
  performed as needed instead of cached at startup.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
---
 arch/powerpc/kernel/rtas-proc.c               | 24 ++++----
 arch/powerpc/kernel/rtas-rtc.c                |  6 +-
 arch/powerpc/kernel/rtas.c                    | 79 +++++++++++++--------------
 arch/powerpc/kernel/rtas_flash.c              | 21 ++++---
 arch/powerpc/kernel/rtas_pci.c                |  8 +--
 arch/powerpc/kernel/rtasd.c                   |  2 +-
 arch/powerpc/platforms/52xx/efika.c           |  4 +-
 arch/powerpc/platforms/cell/ras.c             |  4 +-
 arch/powerpc/platforms/cell/smp.c             |  4 +-
 arch/powerpc/platforms/chrp/nvram.c           |  4 +-
 arch/powerpc/platforms/chrp/pci.c             |  4 +-
 arch/powerpc/platforms/chrp/setup.c           |  4 +-
 arch/powerpc/platforms/maple/setup.c          |  4 +-
 arch/powerpc/platforms/pseries/dlpar.c        |  2 +-
 arch/powerpc/platforms/pseries/eeh_pseries.c  | 22 ++++----
 arch/powerpc/platforms/pseries/hotplug-cpu.c  |  4 +-
 arch/powerpc/platforms/pseries/io_event_irq.c |  2 +-
 arch/powerpc/platforms/pseries/mobility.c     |  4 +-
 arch/powerpc/platforms/pseries/msi.c          |  4 +-
 arch/powerpc/platforms/pseries/nvram.c        |  4 +-
 arch/powerpc/platforms/pseries/pci.c          |  2 +-
 arch/powerpc/platforms/pseries/ras.c          |  2 +-
 arch/powerpc/platforms/pseries/setup.c        |  8 +--
 arch/powerpc/platforms/pseries/smp.c          |  6 +-
 arch/powerpc/sysdev/xics/ics-rtas.c           |  8 +--
 arch/powerpc/xmon/xmon.c                      | 16 +-----
 26 files changed, 119 insertions(+), 133 deletions(-)

diff --git a/arch/powerpc/kernel/rtas-proc.c b/arch/powerpc/kernel/rtas-proc.c
index 081b2b741a8c..9454b8395b6a 100644
--- a/arch/powerpc/kernel/rtas-proc.c
+++ b/arch/powerpc/kernel/rtas-proc.c
@@ -287,9 +287,9 @@ static ssize_t ppc_rtas_poweron_write(struct file *file,
 
 	rtc_time64_to_tm(nowtime, &tm);
 
-	error = rtas_call(rtas_token("set-time-for-power-on"), 7, 1, NULL, 
-			tm.tm_year + 1900, tm.tm_mon + 1, tm.tm_mday,
-			tm.tm_hour, tm.tm_min, tm.tm_sec, 0 /* nano */);
+	error = rtas_call(rtas_function_token(RTAS_FN_SET_TIME_FOR_POWER_ON), 7, 1, NULL,
+			  tm.tm_year + 1900, tm.tm_mon + 1, tm.tm_mday,
+			  tm.tm_hour, tm.tm_min, tm.tm_sec, 0 /* nano */);
 	if (error)
 		printk(KERN_WARNING "error: setting poweron time returned: %s\n", 
 				ppc_rtas_process_error(error));
@@ -350,9 +350,9 @@ static ssize_t ppc_rtas_clock_write(struct file *file,
 		return error;
 
 	rtc_time64_to_tm(nowtime, &tm);
-	error = rtas_call(rtas_token("set-time-of-day"), 7, 1, NULL, 
-			tm.tm_year + 1900, tm.tm_mon + 1, tm.tm_mday,
-			tm.tm_hour, tm.tm_min, tm.tm_sec, 0);
+	error = rtas_call(rtas_function_token(RTAS_FN_SET_TIME_OF_DAY), 7, 1, NULL,
+			  tm.tm_year + 1900, tm.tm_mon + 1, tm.tm_mday,
+			  tm.tm_hour, tm.tm_min, tm.tm_sec, 0);
 	if (error)
 		printk(KERN_WARNING "error: setting the clock returned: %s\n", 
 				ppc_rtas_process_error(error));
@@ -362,7 +362,7 @@ static ssize_t ppc_rtas_clock_write(struct file *file,
 static int ppc_rtas_clock_show(struct seq_file *m, void *v)
 {
 	int ret[8];
-	int error = rtas_call(rtas_token("get-time-of-day"), 0, 8, ret);
+	int error = rtas_call(rtas_function_token(RTAS_FN_GET_TIME_OF_DAY), 0, 8, ret);
 
 	if (error) {
 		printk(KERN_WARNING "error: reading the clock returned: %s\n", 
@@ -385,7 +385,7 @@ static int ppc_rtas_sensors_show(struct seq_file *m, void *v)
 {
 	int i,j;
 	int state, error;
-	int get_sensor_state = rtas_token("get-sensor-state");
+	int get_sensor_state = rtas_function_token(RTAS_FN_GET_SENSOR_STATE);
 
 	seq_printf(m, "RTAS (RunTime Abstraction Services) Sensor Information\n");
 	seq_printf(m, "Sensor\t\tValue\t\tCondition\tLocation\n");
@@ -708,8 +708,8 @@ static ssize_t ppc_rtas_tone_freq_write(struct file *file,
 		return error;
 
 	rtas_tone_frequency = freq; /* save it for later */
-	error = rtas_call(rtas_token("set-indicator"), 3, 1, NULL,
-			TONE_FREQUENCY, 0, freq);
+	error = rtas_call(rtas_function_token(RTAS_FN_SET_INDICATOR), 3, 1, NULL,
+			  TONE_FREQUENCY, 0, freq);
 	if (error)
 		printk(KERN_WARNING "error: setting tone frequency returned: %s\n", 
 				ppc_rtas_process_error(error));
@@ -736,8 +736,8 @@ static ssize_t ppc_rtas_tone_volume_write(struct file *file,
 		volume = 100;
 	
         rtas_tone_volume = volume; /* save it for later */
-	error = rtas_call(rtas_token("set-indicator"), 3, 1, NULL,
-			TONE_VOLUME, 0, volume);
+	error = rtas_call(rtas_function_token(RTAS_FN_SET_INDICATOR), 3, 1, NULL,
+			  TONE_VOLUME, 0, volume);
 	if (error)
 		printk(KERN_WARNING "error: setting tone volume returned: %s\n", 
 				ppc_rtas_process_error(error));
diff --git a/arch/powerpc/kernel/rtas-rtc.c b/arch/powerpc/kernel/rtas-rtc.c
index 5a31d1829bca..6996214532bd 100644
--- a/arch/powerpc/kernel/rtas-rtc.c
+++ b/arch/powerpc/kernel/rtas-rtc.c
@@ -21,7 +21,7 @@ time64_t __init rtas_get_boot_time(void)
 
 	max_wait_tb = get_tb() + tb_ticks_per_usec * 1000 * MAX_RTC_WAIT;
 	do {
-		error = rtas_call(rtas_token("get-time-of-day"), 0, 8, ret);
+		error = rtas_call(rtas_function_token(RTAS_FN_GET_TIME_OF_DAY), 0, 8, ret);
 
 		wait_time = rtas_busy_delay_time(error);
 		if (wait_time) {
@@ -53,7 +53,7 @@ void rtas_get_rtc_time(struct rtc_time *rtc_tm)
 
 	max_wait_tb = get_tb() + tb_ticks_per_usec * 1000 * MAX_RTC_WAIT;
 	do {
-		error = rtas_call(rtas_token("get-time-of-day"), 0, 8, ret);
+		error = rtas_call(rtas_function_token(RTAS_FN_GET_TIME_OF_DAY), 0, 8, ret);
 
 		wait_time = rtas_busy_delay_time(error);
 		if (wait_time) {
@@ -90,7 +90,7 @@ int rtas_set_rtc_time(struct rtc_time *tm)
 
 	max_wait_tb = get_tb() + tb_ticks_per_usec * 1000 * MAX_RTC_WAIT;
 	do {
-	        error = rtas_call(rtas_token("set-time-of-day"), 7, 1, NULL,
+		error = rtas_call(rtas_function_token(RTAS_FN_SET_TIME_OF_DAY), 7, 1, NULL,
 				  tm->tm_year + 1900, tm->tm_mon + 1,
 				  tm->tm_mday, tm->tm_hour, tm->tm_min,
 				  tm->tm_sec, 0);
diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index 17e59306ce63..833f262a2165 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -776,8 +776,8 @@ void rtas_progress(char *s, unsigned short hex)
 					"ibm,display-truncation-length", NULL);
 			of_node_put(root);
 		}
-		display_character = rtas_token("display-character");
-		set_indicator = rtas_token("set-indicator");
+		display_character = rtas_function_token(RTAS_FN_DISPLAY_CHARACTER);
+		set_indicator = rtas_function_token(RTAS_FN_SET_INDICATOR);
 	}
 
 	if (display_character == RTAS_UNKNOWN_SERVICE) {
@@ -931,7 +931,6 @@ static void __init init_error_log_max(void)
 
 
 static char rtas_err_buf[RTAS_ERROR_LOG_MAX];
-static int rtas_last_error_token;
 
 /** Return a copy of the detailed error text associated with the
  *  most recent failed call to rtas.  Because the error text
@@ -941,16 +940,17 @@ static int rtas_last_error_token;
  */
 static char *__fetch_rtas_last_error(char *altbuf)
 {
+	const s32 token = rtas_function_token(RTAS_FN_RTAS_LAST_ERROR);
 	struct rtas_args err_args, save_args;
 	u32 bufsz;
 	char *buf = NULL;
 
-	if (rtas_last_error_token == -1)
+	if (token == -1)
 		return NULL;
 
 	bufsz = rtas_get_error_log_max();
 
-	err_args.token = cpu_to_be32(rtas_last_error_token);
+	err_args.token = cpu_to_be32(token);
 	err_args.nargs = cpu_to_be32(2);
 	err_args.nret = cpu_to_be32(1);
 	err_args.args[0] = cpu_to_be32(__pa(rtas_err_buf));
@@ -1019,8 +1019,11 @@ void rtas_call_unlocked(struct rtas_args *args, int token, int nargs, int nret,
 	va_end(list);
 }
 
-static int ibm_open_errinjct_token;
-static int ibm_errinjct_token;
+static bool token_is_restricted_errinjct(s32 token)
+{
+	return token == rtas_function_token(RTAS_FN_IBM_OPEN_ERRINJCT) ||
+	       token == rtas_function_token(RTAS_FN_IBM_ERRINJCT);
+}
 
 /**
  * rtas_call() - Invoke an RTAS firmware function.
@@ -1092,7 +1095,7 @@ int rtas_call(int token, int nargs, int nret, int *outputs, ...)
 	if (!rtas.entry || token == RTAS_UNKNOWN_SERVICE)
 		return -1;
 
-	if (token == ibm_open_errinjct_token || token == ibm_errinjct_token) {
+	if (token_is_restricted_errinjct(token)) {
 		/*
 		 * It would be nicer to not discard the error value
 		 * from security_locked_down(), but callers expect an
@@ -1323,7 +1326,7 @@ static int rtas_error_rc(int rtas_rc)
 
 int rtas_get_power_level(int powerdomain, int *level)
 {
-	int token = rtas_token("get-power-level");
+	int token = rtas_function_token(RTAS_FN_GET_POWER_LEVEL);
 	int rc;
 
 	if (token == RTAS_UNKNOWN_SERVICE)
@@ -1340,7 +1343,7 @@ EXPORT_SYMBOL_GPL(rtas_get_power_level);
 
 int rtas_set_power_level(int powerdomain, int level, int *setlevel)
 {
-	int token = rtas_token("set-power-level");
+	int token = rtas_function_token(RTAS_FN_SET_POWER_LEVEL);
 	int rc;
 
 	if (token == RTAS_UNKNOWN_SERVICE)
@@ -1358,7 +1361,7 @@ EXPORT_SYMBOL_GPL(rtas_set_power_level);
 
 int rtas_get_sensor(int sensor, int index, int *state)
 {
-	int token = rtas_token("get-sensor-state");
+	int token = rtas_function_token(RTAS_FN_GET_SENSOR_STATE);
 	int rc;
 
 	if (token == RTAS_UNKNOWN_SERVICE)
@@ -1376,7 +1379,7 @@ EXPORT_SYMBOL_GPL(rtas_get_sensor);
 
 int rtas_get_sensor_fast(int sensor, int index, int *state)
 {
-	int token = rtas_token("get-sensor-state");
+	int token = rtas_function_token(RTAS_FN_GET_SENSOR_STATE);
 	int rc;
 
 	if (token == RTAS_UNKNOWN_SERVICE)
@@ -1418,7 +1421,7 @@ bool rtas_indicator_present(int token, int *maxindex)
 
 int rtas_set_indicator(int indicator, int index, int new_value)
 {
-	int token = rtas_token("set-indicator");
+	int token = rtas_function_token(RTAS_FN_SET_INDICATOR);
 	int rc;
 
 	if (token == RTAS_UNKNOWN_SERVICE)
@@ -1439,8 +1442,8 @@ EXPORT_SYMBOL_GPL(rtas_set_indicator);
  */
 int rtas_set_indicator_fast(int indicator, int index, int new_value)
 {
+	int token = rtas_function_token(RTAS_FN_SET_INDICATOR);
 	int rc;
-	int token = rtas_token("set-indicator");
 
 	if (token == RTAS_UNKNOWN_SERVICE)
 		return -ENOENT;
@@ -1482,10 +1485,11 @@ int rtas_set_indicator_fast(int indicator, int index, int new_value)
  */
 int rtas_ibm_suspend_me(int *fw_status)
 {
+	int token = rtas_function_token(RTAS_FN_IBM_SUSPEND_ME);
 	int fwrc;
 	int ret;
 
-	fwrc = rtas_call(rtas_token("ibm,suspend-me"), 0, 1, NULL);
+	fwrc = rtas_call(token, 0, 1, NULL);
 
 	switch (fwrc) {
 	case 0:
@@ -1518,7 +1522,7 @@ void __noreturn rtas_restart(char *cmd)
 	if (rtas_flash_term_hook)
 		rtas_flash_term_hook(SYS_RESTART);
 	pr_emerg("system-reboot returned %d\n",
-		 rtas_call(rtas_token("system-reboot"), 0, 1, NULL));
+		 rtas_call(rtas_function_token(RTAS_FN_SYSTEM_REBOOT), 0, 1, NULL));
 	for (;;);
 }
 
@@ -1528,7 +1532,7 @@ void rtas_power_off(void)
 		rtas_flash_term_hook(SYS_POWER_OFF);
 	/* allow power on only with power button press */
 	pr_emerg("power-off returned %d\n",
-		 rtas_call(rtas_token("power-off"), 2, 1, NULL, -1, -1));
+		 rtas_call(rtas_function_token(RTAS_FN_POWER_OFF), 2, 1, NULL, -1, -1));
 	for (;;);
 }
 
@@ -1538,16 +1542,17 @@ void __noreturn rtas_halt(void)
 		rtas_flash_term_hook(SYS_HALT);
 	/* allow power on only with power button press */
 	pr_emerg("power-off returned %d\n",
-		 rtas_call(rtas_token("power-off"), 2, 1, NULL, -1, -1));
+		 rtas_call(rtas_function_token(RTAS_FN_POWER_OFF), 2, 1, NULL, -1, -1));
 	for (;;);
 }
 
 /* Must be in the RMO region, so we place it here */
 static char rtas_os_term_buf[2048];
-static s32 ibm_os_term_token = RTAS_UNKNOWN_SERVICE;
+static bool ibm_extended_os_term;
 
 void rtas_os_term(char *str)
 {
+	s32 token = rtas_function_token(RTAS_FN_IBM_OS_TERM);
 	int status;
 
 	/*
@@ -1556,7 +1561,8 @@ void rtas_os_term(char *str)
 	 * this property may terminate the partition which we want to avoid
 	 * since it interferes with panic_timeout.
 	 */
-	if (ibm_os_term_token == RTAS_UNKNOWN_SERVICE)
+
+	if (token == RTAS_UNKNOWN_SERVICE || !ibm_extended_os_term)
 		return;
 
 	snprintf(rtas_os_term_buf, 2048, "OS panic: %s", str);
@@ -1567,8 +1573,7 @@ void rtas_os_term(char *str)
 	 * schedules.
 	 */
 	do {
-		status = rtas_call(ibm_os_term_token, 1, 1, NULL,
-				   __pa(rtas_os_term_buf));
+		status = rtas_call(token, 1, 1, NULL, __pa(rtas_os_term_buf));
 	} while (rtas_busy_delay_time(status));
 
 	if (status != 0)
@@ -1588,10 +1593,9 @@ void rtas_os_term(char *str)
  */
 void rtas_activate_firmware(void)
 {
-	int token;
+	int token = rtas_function_token(RTAS_FN_IBM_ACTIVATE_FIRMWARE);
 	int fwrc;
 
-	token = rtas_token("ibm,activate-firmware");
 	if (token == RTAS_UNKNOWN_SERVICE) {
 		pr_notice("ibm,activate-firmware method unavailable\n");
 		return;
@@ -1677,6 +1681,8 @@ static bool block_rtas_call(int token, int nargs,
 {
 	const struct rtas_function *func;
 	const struct rtas_filter *f;
+	const bool is_platform_dump = token == rtas_function_token(RTAS_FN_IBM_PLATFORM_DUMP);
+	const bool is_config_conn = token == rtas_function_token(RTAS_FN_IBM_CONFIGURE_CONNECTOR);
 	u32 base, size, end;
 
 	/*
@@ -1713,8 +1719,7 @@ static bool block_rtas_call(int token, int nargs,
 		 * Special case for ibm,platform-dump - NULL buffer
 		 * address is used to indicate end of dump processing
 		 */
-		if (!strcmp(func->name, "ibm,platform-dump") &&
-		    base == 0)
+		if (is_platform_dump && base == 0)
 			return false;
 
 		if (!in_rmo_buf(base, end))
@@ -1735,8 +1740,7 @@ static bool block_rtas_call(int token, int nargs,
 		 * Special case for ibm,configure-connector where the
 		 * address can be 0
 		 */
-		if (!strcmp(func->name, "ibm,configure-connector") &&
-		    base == 0)
+		if (is_config_conn && base == 0)
 			return false;
 
 		if (!in_rmo_buf(base, end))
@@ -1791,7 +1795,7 @@ SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs)
 	if (block_rtas_call(token, nargs, &args))
 		return -EINVAL;
 
-	if (token == ibm_open_errinjct_token || token == ibm_errinjct_token) {
+	if (token_is_restricted_errinjct(token)) {
 		int err;
 
 		err = security_locked_down(LOCKDOWN_RTAS_ERROR_INJECTION);
@@ -1800,7 +1804,7 @@ SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs)
 	}
 
 	/* Need to handle ibm,suspend_me call specially */
-	if (token == rtas_token("ibm,suspend-me")) {
+	if (token == rtas_function_token(RTAS_FN_IBM_SUSPEND_ME)) {
 
 		/*
 		 * rtas_ibm_suspend_me assumes the streamid handle is in cpu
@@ -1935,11 +1939,10 @@ void __init rtas_initialize(void)
 	rtas_function_table_init();
 
 	/*
-	 * Discover these now to avoid device tree lookups in the
+	 * Discover this now to avoid a device tree lookup in the
 	 * panic path.
 	 */
-	if (of_property_read_bool(rtas.dev, "ibm,extended-os-term"))
-		ibm_os_term_token = rtas_token("ibm,os-term");
+	ibm_extended_os_term = of_property_read_bool(rtas.dev, "ibm,extended-os-term");
 
 	/* If RTAS was found, allocate the RMO buffer for it and look for
 	 * the stop-self token if any
@@ -1954,12 +1957,6 @@ void __init rtas_initialize(void)
 		panic("ERROR: RTAS: Failed to allocate %lx bytes below %pa\n",
 		      PAGE_SIZE, &rtas_region);
 
-#ifdef CONFIG_RTAS_ERROR_LOGGING
-	rtas_last_error_token = rtas_token("rtas-last-error");
-#endif
-	ibm_open_errinjct_token = rtas_token("ibm,open-errinjct");
-	ibm_errinjct_token = rtas_token("ibm,errinjct");
-
 	rtas_work_area_reserve_arena(rtas_region);
 }
 
@@ -2015,13 +2012,13 @@ void rtas_give_timebase(void)
 
 	raw_spin_lock_irqsave(&timebase_lock, flags);
 	hard_irq_disable();
-	rtas_call(rtas_token("freeze-time-base"), 0, 1, NULL);
+	rtas_call(rtas_function_token(RTAS_FN_FREEZE_TIME_BASE), 0, 1, NULL);
 	timebase = get_tb();
 	raw_spin_unlock(&timebase_lock);
 
 	while (timebase)
 		barrier();
-	rtas_call(rtas_token("thaw-time-base"), 0, 1, NULL);
+	rtas_call(rtas_function_token(RTAS_FN_THAW_TIME_BASE), 0, 1, NULL);
 	local_irq_restore(flags);
 }
 
diff --git a/arch/powerpc/kernel/rtas_flash.c b/arch/powerpc/kernel/rtas_flash.c
index bc817a5619d6..4caf5e3079eb 100644
--- a/arch/powerpc/kernel/rtas_flash.c
+++ b/arch/powerpc/kernel/rtas_flash.c
@@ -376,7 +376,7 @@ static void manage_flash(struct rtas_manage_flash_t *args_buf, unsigned int op)
 	s32 rc;
 
 	do {
-		rc = rtas_call(rtas_token("ibm,manage-flash-image"), 1, 1,
+		rc = rtas_call(rtas_function_token(RTAS_FN_IBM_MANAGE_FLASH_IMAGE), 1, 1,
 			       NULL, op);
 	} while (rtas_busy_delay(rc));
 
@@ -444,7 +444,7 @@ static ssize_t manage_flash_write(struct file *file, const char __user *buf,
  */
 static void validate_flash(struct rtas_validate_flash_t *args_buf)
 {
-	int token = rtas_token("ibm,validate-flash-image");
+	int token = rtas_function_token(RTAS_FN_IBM_VALIDATE_FLASH_IMAGE);
 	int update_results;
 	s32 rc;	
 
@@ -570,7 +570,7 @@ static void rtas_flash_firmware(int reboot_type)
 		return;
 	}
 
-	update_token = rtas_token("ibm,update-flash-64-and-reboot");
+	update_token = rtas_function_token(RTAS_FN_IBM_UPDATE_FLASH_64_AND_REBOOT);
 	if (update_token == RTAS_UNKNOWN_SERVICE) {
 		printk(KERN_ALERT "FLASH: ibm,update-flash-64-and-reboot "
 		       "is not available -- not a service partition?\n");
@@ -653,7 +653,7 @@ static void rtas_flash_firmware(int reboot_type)
  */
 struct rtas_flash_file {
 	const char *filename;
-	const char *rtas_call_name;
+	const rtas_fn_handle_t handle;
 	int *status;
 	const struct proc_ops ops;
 };
@@ -661,7 +661,7 @@ struct rtas_flash_file {
 static const struct rtas_flash_file rtas_flash_files[] = {
 	{
 		.filename	= "powerpc/rtas/" FIRMWARE_FLASH_NAME,
-		.rtas_call_name	= "ibm,update-flash-64-and-reboot",
+		.handle		= RTAS_FN_IBM_UPDATE_FLASH_64_AND_REBOOT,
 		.status		= &rtas_update_flash_data.status,
 		.ops.proc_read	= rtas_flash_read_msg,
 		.ops.proc_write	= rtas_flash_write,
@@ -670,7 +670,7 @@ static const struct rtas_flash_file rtas_flash_files[] = {
 	},
 	{
 		.filename	= "powerpc/rtas/" FIRMWARE_UPDATE_NAME,
-		.rtas_call_name	= "ibm,update-flash-64-and-reboot",
+		.handle		= RTAS_FN_IBM_UPDATE_FLASH_64_AND_REBOOT,
 		.status		= &rtas_update_flash_data.status,
 		.ops.proc_read	= rtas_flash_read_num,
 		.ops.proc_write	= rtas_flash_write,
@@ -679,7 +679,7 @@ static const struct rtas_flash_file rtas_flash_files[] = {
 	},
 	{
 		.filename	= "powerpc/rtas/" VALIDATE_FLASH_NAME,
-		.rtas_call_name	= "ibm,validate-flash-image",
+		.handle		= RTAS_FN_IBM_VALIDATE_FLASH_IMAGE,
 		.status		= &rtas_validate_flash_data.status,
 		.ops.proc_read	= validate_flash_read,
 		.ops.proc_write	= validate_flash_write,
@@ -688,7 +688,7 @@ static const struct rtas_flash_file rtas_flash_files[] = {
 	},
 	{
 		.filename	= "powerpc/rtas/" MANAGE_FLASH_NAME,
-		.rtas_call_name	= "ibm,manage-flash-image",
+		.handle		= RTAS_FN_IBM_MANAGE_FLASH_IMAGE,
 		.status		= &rtas_manage_flash_data.status,
 		.ops.proc_read	= manage_flash_read,
 		.ops.proc_write	= manage_flash_write,
@@ -700,8 +700,7 @@ static int __init rtas_flash_init(void)
 {
 	int i;
 
-	if (rtas_token("ibm,update-flash-64-and-reboot") ==
-		       RTAS_UNKNOWN_SERVICE) {
+	if (rtas_function_token(RTAS_FN_IBM_UPDATE_FLASH_64_AND_REBOOT) == RTAS_UNKNOWN_SERVICE) {
 		pr_info("rtas_flash: no firmware flash support\n");
 		return -EINVAL;
 	}
@@ -730,7 +729,7 @@ static int __init rtas_flash_init(void)
 		 * This code assumes that the status int is the first member of the
 		 * struct
 		 */
-		token = rtas_token(f->rtas_call_name);
+		token = rtas_function_token(f->handle);
 		if (token == RTAS_UNKNOWN_SERVICE)
 			*f->status = FLASH_AUTH;
 		else
diff --git a/arch/powerpc/kernel/rtas_pci.c b/arch/powerpc/kernel/rtas_pci.c
index 5a2f5ea3b054..e1fdc7473b72 100644
--- a/arch/powerpc/kernel/rtas_pci.c
+++ b/arch/powerpc/kernel/rtas_pci.c
@@ -191,10 +191,10 @@ static void python_countermeasures(struct device_node *dev)
 
 void __init init_pci_config_tokens(void)
 {
-	read_pci_config = rtas_token("read-pci-config");
-	write_pci_config = rtas_token("write-pci-config");
-	ibm_read_pci_config = rtas_token("ibm,read-pci-config");
-	ibm_write_pci_config = rtas_token("ibm,write-pci-config");
+	read_pci_config = rtas_function_token(RTAS_FN_READ_PCI_CONFIG);
+	write_pci_config = rtas_function_token(RTAS_FN_WRITE_PCI_CONFIG);
+	ibm_read_pci_config = rtas_function_token(RTAS_FN_IBM_READ_PCI_CONFIG);
+	ibm_write_pci_config = rtas_function_token(RTAS_FN_IBM_WRITE_PCI_CONFIG);
 }
 
 unsigned long get_phb_buid(struct device_node *phb)
diff --git a/arch/powerpc/kernel/rtasd.c b/arch/powerpc/kernel/rtasd.c
index cc56ac6ba4b0..9bba469239fc 100644
--- a/arch/powerpc/kernel/rtasd.c
+++ b/arch/powerpc/kernel/rtasd.c
@@ -506,7 +506,7 @@ static int __init rtas_event_scan_init(void)
 		return 0;
 
 	/* No RTAS */
-	event_scan = rtas_token("event-scan");
+	event_scan = rtas_function_token(RTAS_FN_EVENT_SCAN);
 	if (event_scan == RTAS_UNKNOWN_SERVICE) {
 		printk(KERN_INFO "rtasd: No event-scan on system\n");
 		return -ENODEV;
diff --git a/arch/powerpc/platforms/52xx/efika.c b/arch/powerpc/platforms/52xx/efika.c
index e0647720ed5e..61dfec74ff85 100644
--- a/arch/powerpc/platforms/52xx/efika.c
+++ b/arch/powerpc/platforms/52xx/efika.c
@@ -41,7 +41,7 @@ static int rtas_read_config(struct pci_bus *bus, unsigned int devfn, int offset,
 	int ret = -1;
 	int rval;
 
-	rval = rtas_call(rtas_token("read-pci-config"), 2, 2, &ret, addr, len);
+	rval = rtas_call(rtas_function_token(RTAS_FN_READ_PCI_CONFIG), 2, 2, &ret, addr, len);
 	*val = ret;
 	return rval ? PCIBIOS_DEVICE_NOT_FOUND : PCIBIOS_SUCCESSFUL;
 }
@@ -55,7 +55,7 @@ static int rtas_write_config(struct pci_bus *bus, unsigned int devfn,
 	    | (hose->global_number << 24);
 	int rval;
 
-	rval = rtas_call(rtas_token("write-pci-config"), 3, 1, NULL,
+	rval = rtas_call(rtas_function_token(RTAS_FN_WRITE_PCI_CONFIG), 3, 1, NULL,
 			 addr, len, val);
 	return rval ? PCIBIOS_DEVICE_NOT_FOUND : PCIBIOS_SUCCESSFUL;
 }
diff --git a/arch/powerpc/platforms/cell/ras.c b/arch/powerpc/platforms/cell/ras.c
index 8d934ea6270c..98db63b72d56 100644
--- a/arch/powerpc/platforms/cell/ras.c
+++ b/arch/powerpc/platforms/cell/ras.c
@@ -297,8 +297,8 @@ int cbe_sysreset_hack(void)
 static int __init cbe_ptcal_init(void)
 {
 	int ret;
-	ptcal_start_tok = rtas_token("ibm,cbe-start-ptcal");
-	ptcal_stop_tok = rtas_token("ibm,cbe-stop-ptcal");
+	ptcal_start_tok = rtas_function_token(RTAS_FN_IBM_CBE_START_PTCAL);
+	ptcal_stop_tok = rtas_function_token(RTAS_FN_IBM_CBE_STOP_PTCAL);
 
 	if (ptcal_start_tok == RTAS_UNKNOWN_SERVICE
 			|| ptcal_stop_tok == RTAS_UNKNOWN_SERVICE)
diff --git a/arch/powerpc/platforms/cell/smp.c b/arch/powerpc/platforms/cell/smp.c
index 31ce00b52a32..30394c6f8894 100644
--- a/arch/powerpc/platforms/cell/smp.c
+++ b/arch/powerpc/platforms/cell/smp.c
@@ -81,7 +81,7 @@ static inline int smp_startup_cpu(unsigned int lcpu)
 	 * If the RTAS start-cpu token does not exist then presume the
 	 * cpu is already spinning.
 	 */
-	start_cpu = rtas_token("start-cpu");
+	start_cpu = rtas_function_token(RTAS_FN_START_CPU);
 	if (start_cpu == RTAS_UNKNOWN_SERVICE)
 		return 1;
 
@@ -152,7 +152,7 @@ void __init smp_init_cell(void)
 	cpumask_clear_cpu(boot_cpuid, &of_spin_map);
 
 	/* Non-lpar has additional take/give timebase */
-	if (rtas_token("freeze-time-base") != RTAS_UNKNOWN_SERVICE) {
+	if (rtas_function_token(RTAS_FN_FREEZE_TIME_BASE) != RTAS_UNKNOWN_SERVICE) {
 		smp_ops->give_timebase = rtas_give_timebase;
 		smp_ops->take_timebase = rtas_take_timebase;
 	}
diff --git a/arch/powerpc/platforms/chrp/nvram.c b/arch/powerpc/platforms/chrp/nvram.c
index dab78076fedb..0eedae96498c 100644
--- a/arch/powerpc/platforms/chrp/nvram.c
+++ b/arch/powerpc/platforms/chrp/nvram.c
@@ -31,7 +31,7 @@ static unsigned char chrp_nvram_read_val(int addr)
 		return 0xff;
 	}
 	spin_lock_irqsave(&nvram_lock, flags);
-	if ((rtas_call(rtas_token("nvram-fetch"), 3, 2, &done, addr,
+	if ((rtas_call(rtas_function_token(RTAS_FN_NVRAM_FETCH), 3, 2, &done, addr,
 		       __pa(nvram_buf), 1) != 0) || 1 != done)
 		ret = 0xff;
 	else
@@ -53,7 +53,7 @@ static void chrp_nvram_write_val(int addr, unsigned char val)
 	}
 	spin_lock_irqsave(&nvram_lock, flags);
 	nvram_buf[0] = val;
-	if ((rtas_call(rtas_token("nvram-store"), 3, 2, &done, addr,
+	if ((rtas_call(rtas_function_token(RTAS_FN_NVRAM_STORE), 3, 2, &done, addr,
 		       __pa(nvram_buf), 1) != 0) || 1 != done)
 		printk(KERN_DEBUG "rtas IO error storing 0x%02x at %d", val, addr);
 	spin_unlock_irqrestore(&nvram_lock, flags);
diff --git a/arch/powerpc/platforms/chrp/pci.c b/arch/powerpc/platforms/chrp/pci.c
index 6f6598e771ff..428fd2a7b3ee 100644
--- a/arch/powerpc/platforms/chrp/pci.c
+++ b/arch/powerpc/platforms/chrp/pci.c
@@ -104,7 +104,7 @@ static int rtas_read_config(struct pci_bus *bus, unsigned int devfn, int offset,
         int ret = -1;
 	int rval;
 
-	rval = rtas_call(rtas_token("read-pci-config"), 2, 2, &ret, addr, len);
+	rval = rtas_call(rtas_function_token(RTAS_FN_READ_PCI_CONFIG), 2, 2, &ret, addr, len);
 	*val = ret;
 	return rval? PCIBIOS_DEVICE_NOT_FOUND: PCIBIOS_SUCCESSFUL;
 }
@@ -118,7 +118,7 @@ static int rtas_write_config(struct pci_bus *bus, unsigned int devfn, int offset
 		| (hose->global_number << 24);
 	int rval;
 
-	rval = rtas_call(rtas_token("write-pci-config"), 3, 1, NULL,
+	rval = rtas_call(rtas_function_token(RTAS_FN_WRITE_PCI_CONFIG), 3, 1, NULL,
 			 addr, len, val);
 	return rval? PCIBIOS_DEVICE_NOT_FOUND: PCIBIOS_SUCCESSFUL;
 }
diff --git a/arch/powerpc/platforms/chrp/setup.c b/arch/powerpc/platforms/chrp/setup.c
index ec63c0558db6..d9049ceb1046 100644
--- a/arch/powerpc/platforms/chrp/setup.c
+++ b/arch/powerpc/platforms/chrp/setup.c
@@ -323,11 +323,11 @@ static void __init chrp_setup_arch(void)
 	printk("chrp type = %x [%s]\n", _chrp_type, chrp_names[_chrp_type]);
 
 	rtas_initialize();
-	if (rtas_token("display-character") >= 0)
+	if (rtas_function_token(RTAS_FN_DISPLAY_CHARACTER) >= 0)
 		ppc_md.progress = rtas_progress;
 
 	/* use RTAS time-of-day routines if available */
-	if (rtas_token("get-time-of-day") != RTAS_UNKNOWN_SERVICE) {
+	if (rtas_function_token(RTAS_FN_GET_TIME_OF_DAY) != RTAS_UNKNOWN_SERVICE) {
 		ppc_md.get_boot_time	= rtas_get_boot_time;
 		ppc_md.get_rtc_time	= rtas_get_rtc_time;
 		ppc_md.set_rtc_time	= rtas_set_rtc_time;
diff --git a/arch/powerpc/platforms/maple/setup.c b/arch/powerpc/platforms/maple/setup.c
index c26c379e1cc8..98c8e3603064 100644
--- a/arch/powerpc/platforms/maple/setup.c
+++ b/arch/powerpc/platforms/maple/setup.c
@@ -162,8 +162,8 @@ static struct smp_ops_t maple_smp_ops = {
 
 static void __init maple_use_rtas_reboot_and_halt_if_present(void)
 {
-	if (rtas_service_present("system-reboot") &&
-	    rtas_service_present("power-off")) {
+	if (rtas_function_implemented(RTAS_FN_SYSTEM_REBOOT) &&
+	    rtas_function_implemented(RTAS_FN_POWER_OFF)) {
 		ppc_md.restart = rtas_restart;
 		pm_power_off = rtas_power_off;
 		ppc_md.halt = rtas_halt;
diff --git a/arch/powerpc/platforms/pseries/dlpar.c b/arch/powerpc/platforms/pseries/dlpar.c
index 9b65b50a5456..75ffdbcd2865 100644
--- a/arch/powerpc/platforms/pseries/dlpar.c
+++ b/arch/powerpc/platforms/pseries/dlpar.c
@@ -143,7 +143,7 @@ struct device_node *dlpar_configure_connector(__be32 drc_index,
 	int cc_token;
 	int rc = -1;
 
-	cc_token = rtas_token("ibm,configure-connector");
+	cc_token = rtas_function_token(RTAS_FN_IBM_CONFIGURE_CONNECTOR);
 	if (cc_token == RTAS_UNKNOWN_SERVICE)
 		return NULL;
 
diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c b/arch/powerpc/platforms/pseries/eeh_pseries.c
index 6b507b62ce8f..def184da51cf 100644
--- a/arch/powerpc/platforms/pseries/eeh_pseries.c
+++ b/arch/powerpc/platforms/pseries/eeh_pseries.c
@@ -699,7 +699,7 @@ static int pseries_eeh_write_config(struct eeh_dev *edev, int where, int size, u
 static int pseries_send_allow_unfreeze(struct pci_dn *pdn, u16 *vf_pe_array, int cur_vfs)
 {
 	int rc;
-	int ibm_allow_unfreeze = rtas_token("ibm,open-sriov-allow-unfreeze");
+	int ibm_allow_unfreeze = rtas_function_token(RTAS_FN_IBM_OPEN_SRIOV_ALLOW_UNFREEZE);
 	unsigned long buid, addr;
 
 	addr = rtas_config_addr(pdn->busno, pdn->devfn, 0);
@@ -774,7 +774,7 @@ static int pseries_notify_resume(struct eeh_dev *edev)
 	if (!edev)
 		return -EEXIST;
 
-	if (rtas_token("ibm,open-sriov-allow-unfreeze") == RTAS_UNKNOWN_SERVICE)
+	if (rtas_function_token(RTAS_FN_IBM_OPEN_SRIOV_ALLOW_UNFREEZE) == RTAS_UNKNOWN_SERVICE)
 		return -EINVAL;
 
 	if (edev->pdev->is_physfn || edev->pdev->is_virtfn)
@@ -815,14 +815,14 @@ static int __init eeh_pseries_init(void)
 	int ret, config_addr;
 
 	/* figure out EEH RTAS function call tokens */
-	ibm_set_eeh_option		= rtas_token("ibm,set-eeh-option");
-	ibm_set_slot_reset		= rtas_token("ibm,set-slot-reset");
-	ibm_read_slot_reset_state2	= rtas_token("ibm,read-slot-reset-state2");
-	ibm_read_slot_reset_state	= rtas_token("ibm,read-slot-reset-state");
-	ibm_slot_error_detail		= rtas_token("ibm,slot-error-detail");
-	ibm_get_config_addr_info2	= rtas_token("ibm,get-config-addr-info2");
-	ibm_get_config_addr_info	= rtas_token("ibm,get-config-addr-info");
-	ibm_configure_pe		= rtas_token("ibm,configure-pe");
+	ibm_set_eeh_option		= rtas_function_token(RTAS_FN_IBM_SET_EEH_OPTION);
+	ibm_set_slot_reset		= rtas_function_token(RTAS_FN_IBM_SET_SLOT_RESET);
+	ibm_read_slot_reset_state2	= rtas_function_token(RTAS_FN_IBM_READ_SLOT_RESET_STATE2);
+	ibm_read_slot_reset_state	= rtas_function_token(RTAS_FN_IBM_READ_SLOT_RESET_STATE);
+	ibm_slot_error_detail		= rtas_function_token(RTAS_FN_IBM_SLOT_ERROR_DETAIL);
+	ibm_get_config_addr_info2	= rtas_function_token(RTAS_FN_IBM_GET_CONFIG_ADDR_INFO2);
+	ibm_get_config_addr_info	= rtas_function_token(RTAS_FN_IBM_GET_CONFIG_ADDR_INFO);
+	ibm_configure_pe		= rtas_function_token(RTAS_FN_IBM_CONFIGURE_PE);
 
 	/*
 	 * ibm,configure-pe and ibm,configure-bridge have the same semantics,
@@ -830,7 +830,7 @@ static int __init eeh_pseries_init(void)
 	 * ibm,configure-pe then fall back to using ibm,configure-bridge.
 	 */
 	if (ibm_configure_pe == RTAS_UNKNOWN_SERVICE)
-		ibm_configure_pe	= rtas_token("ibm,configure-bridge");
+		ibm_configure_pe	= rtas_function_token(RTAS_FN_IBM_CONFIGURE_BRIDGE);
 
 	/*
 	 * Necessary sanity check. We needn't check "get-config-addr-info"
diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c
index 090ae5a1e0f5..982e5e4b5e06 100644
--- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -855,8 +855,8 @@ static int __init pseries_cpu_hotplug_init(void)
 	ppc_md.cpu_release = dlpar_cpu_release;
 #endif /* CONFIG_ARCH_CPU_PROBE_RELEASE */
 
-	rtas_stop_self_token = rtas_token("stop-self");
-	qcss_tok = rtas_token("query-cpu-stopped-state");
+	rtas_stop_self_token = rtas_function_token(RTAS_FN_STOP_SELF);
+	qcss_tok = rtas_function_token(RTAS_FN_QUERY_CPU_STOPPED_STATE);
 
 	if (rtas_stop_self_token == RTAS_UNKNOWN_SERVICE ||
 			qcss_tok == RTAS_UNKNOWN_SERVICE) {
diff --git a/arch/powerpc/platforms/pseries/io_event_irq.c b/arch/powerpc/platforms/pseries/io_event_irq.c
index 7b74d4d34e9a..f411d4fe7b24 100644
--- a/arch/powerpc/platforms/pseries/io_event_irq.c
+++ b/arch/powerpc/platforms/pseries/io_event_irq.c
@@ -143,7 +143,7 @@ static int __init ioei_init(void)
 {
 	struct device_node *np;
 
-	ioei_check_exception_token = rtas_token("check-exception");
+	ioei_check_exception_token = rtas_function_token(RTAS_FN_CHECK_EXCEPTION);
 	if (ioei_check_exception_token == RTAS_UNKNOWN_SERVICE)
 		return -ENODEV;
 
diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
index 4cea71aa0f41..643d309d1bd0 100644
--- a/arch/powerpc/platforms/pseries/mobility.c
+++ b/arch/powerpc/platforms/pseries/mobility.c
@@ -195,7 +195,7 @@ static int update_dt_node(struct device_node *dn, s32 scope)
 	u32 nprops;
 	u32 vd;
 
-	update_properties_token = rtas_token("ibm,update-properties");
+	update_properties_token = rtas_function_token(RTAS_FN_IBM_UPDATE_PROPERTIES);
 	if (update_properties_token == RTAS_UNKNOWN_SERVICE)
 		return -EINVAL;
 
@@ -306,7 +306,7 @@ static int pseries_devicetree_update(s32 scope)
 	int update_nodes_token;
 	int rc;
 
-	update_nodes_token = rtas_token("ibm,update-nodes");
+	update_nodes_token = rtas_function_token(RTAS_FN_IBM_UPDATE_NODES);
 	if (update_nodes_token == RTAS_UNKNOWN_SERVICE)
 		return 0;
 
diff --git a/arch/powerpc/platforms/pseries/msi.c b/arch/powerpc/platforms/pseries/msi.c
index 3f05507e444d..423ee1d5bd94 100644
--- a/arch/powerpc/platforms/pseries/msi.c
+++ b/arch/powerpc/platforms/pseries/msi.c
@@ -679,8 +679,8 @@ static void rtas_msi_pci_irq_fixup(struct pci_dev *pdev)
 
 static int rtas_msi_init(void)
 {
-	query_token  = rtas_token("ibm,query-interrupt-source-number");
-	change_token = rtas_token("ibm,change-msi");
+	query_token  = rtas_function_token(RTAS_FN_IBM_QUERY_INTERRUPT_SOURCE_NUMBER);
+	change_token = rtas_function_token(RTAS_FN_IBM_CHANGE_MSI);
 
 	if ((query_token == RTAS_UNKNOWN_SERVICE) ||
 			(change_token == RTAS_UNKNOWN_SERVICE)) {
diff --git a/arch/powerpc/platforms/pseries/nvram.c b/arch/powerpc/platforms/pseries/nvram.c
index cbf1720eb4aa..8130c37962c0 100644
--- a/arch/powerpc/platforms/pseries/nvram.c
+++ b/arch/powerpc/platforms/pseries/nvram.c
@@ -227,8 +227,8 @@ int __init pSeries_nvram_init(void)
 
 	nvram_size = be32_to_cpup(nbytes_p);
 
-	nvram_fetch = rtas_token("nvram-fetch");
-	nvram_store = rtas_token("nvram-store");
+	nvram_fetch = rtas_function_token(RTAS_FN_NVRAM_FETCH);
+	nvram_store = rtas_function_token(RTAS_FN_NVRAM_STORE);
 	printk(KERN_INFO "PPC64 nvram contains %d bytes\n", nvram_size);
 	of_node_put(nvram);
 
diff --git a/arch/powerpc/platforms/pseries/pci.c b/arch/powerpc/platforms/pseries/pci.c
index 6e671c3809ec..60e0a58928ef 100644
--- a/arch/powerpc/platforms/pseries/pci.c
+++ b/arch/powerpc/platforms/pseries/pci.c
@@ -60,7 +60,7 @@ static int pseries_send_map_pe(struct pci_dev *pdev, u16 num_vfs,
 	struct pci_dn *pdn;
 	int rc;
 	unsigned long buid, addr;
-	int ibm_map_pes = rtas_token("ibm,open-sriov-map-pe-number");
+	int ibm_map_pes = rtas_function_token(RTAS_FN_IBM_OPEN_SRIOV_MAP_PE_NUMBER);
 
 	if (ibm_map_pes == RTAS_UNKNOWN_SERVICE)
 		return -EINVAL;
diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c
index f12516c3998c..adafd593d9d3 100644
--- a/arch/powerpc/platforms/pseries/ras.c
+++ b/arch/powerpc/platforms/pseries/ras.c
@@ -155,7 +155,7 @@ static int __init init_ras_IRQ(void)
 {
 	struct device_node *np;
 
-	ras_check_exception_token = rtas_token("check-exception");
+	ras_check_exception_token = rtas_function_token(RTAS_FN_CHECK_EXCEPTION);
 
 	/* Internal Errors */
 	np = of_find_node_by_path("/event-sources/internal-errors");
diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c
index 420a2fa48292..4a0cec8cf623 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -136,11 +136,11 @@ static void __init fwnmi_init(void)
 #endif
 	int ibm_nmi_register_token;
 
-	ibm_nmi_register_token = rtas_token("ibm,nmi-register");
+	ibm_nmi_register_token = rtas_function_token(RTAS_FN_IBM_NMI_REGISTER);
 	if (ibm_nmi_register_token == RTAS_UNKNOWN_SERVICE)
 		return;
 
-	ibm_nmi_interlock_token = rtas_token("ibm,nmi-interlock");
+	ibm_nmi_interlock_token = rtas_function_token(RTAS_FN_IBM_NMI_INTERLOCK);
 	if (WARN_ON(ibm_nmi_interlock_token == RTAS_UNKNOWN_SERVICE))
 		return;
 
@@ -1071,14 +1071,14 @@ static void __init pseries_init(void)
 static void pseries_power_off(void)
 {
 	int rc;
-	int rtas_poweroff_ups_token = rtas_token("ibm,power-off-ups");
+	int rtas_poweroff_ups_token = rtas_function_token(RTAS_FN_IBM_POWER_OFF_UPS);
 
 	if (rtas_flash_term_hook)
 		rtas_flash_term_hook(SYS_POWER_OFF);
 
 	if (rtas_poweron_auto == 0 ||
 		rtas_poweroff_ups_token == RTAS_UNKNOWN_SERVICE) {
-		rc = rtas_call(rtas_token("power-off"), 2, 1, NULL, -1, -1);
+		rc = rtas_call(rtas_function_token(RTAS_FN_POWER_OFF), 2, 1, NULL, -1, -1);
 		printk(KERN_INFO "RTAS power-off returned %d\n", rc);
 	} else {
 		rc = rtas_call(rtas_poweroff_ups_token, 0, 1, NULL);
diff --git a/arch/powerpc/platforms/pseries/smp.c b/arch/powerpc/platforms/pseries/smp.c
index 2bcfee86ff87..c597711ef20a 100644
--- a/arch/powerpc/platforms/pseries/smp.c
+++ b/arch/powerpc/platforms/pseries/smp.c
@@ -55,7 +55,7 @@ static cpumask_var_t of_spin_mask;
 int smp_query_cpu_stopped(unsigned int pcpu)
 {
 	int cpu_status, status;
-	int qcss_tok = rtas_token("query-cpu-stopped-state");
+	int qcss_tok = rtas_function_token(RTAS_FN_QUERY_CPU_STOPPED_STATE);
 
 	if (qcss_tok == RTAS_UNKNOWN_SERVICE) {
 		printk_once(KERN_INFO
@@ -108,7 +108,7 @@ static inline int smp_startup_cpu(unsigned int lcpu)
 	 * If the RTAS start-cpu token does not exist then presume the
 	 * cpu is already spinning.
 	 */
-	start_cpu = rtas_token("start-cpu");
+	start_cpu = rtas_function_token(RTAS_FN_START_CPU);
 	if (start_cpu == RTAS_UNKNOWN_SERVICE)
 		return 1;
 
@@ -266,7 +266,7 @@ void __init smp_init_pseries(void)
 	 * We know prom_init will not have started them if RTAS supports
 	 * query-cpu-stopped-state.
 	 */
-	if (rtas_token("query-cpu-stopped-state") == RTAS_UNKNOWN_SERVICE) {
+	if (rtas_function_token(RTAS_FN_QUERY_CPU_STOPPED_STATE) == RTAS_UNKNOWN_SERVICE) {
 		if (cpu_has_feature(CPU_FTR_SMT)) {
 			for_each_present_cpu(i) {
 				if (cpu_thread_in_core(i) == 0)
diff --git a/arch/powerpc/sysdev/xics/ics-rtas.c b/arch/powerpc/sysdev/xics/ics-rtas.c
index f8320f8e5bc7..b772a833d9b7 100644
--- a/arch/powerpc/sysdev/xics/ics-rtas.c
+++ b/arch/powerpc/sysdev/xics/ics-rtas.c
@@ -200,10 +200,10 @@ static struct ics ics_rtas = {
 
 __init int ics_rtas_init(void)
 {
-	ibm_get_xive = rtas_token("ibm,get-xive");
-	ibm_set_xive = rtas_token("ibm,set-xive");
-	ibm_int_on  = rtas_token("ibm,int-on");
-	ibm_int_off = rtas_token("ibm,int-off");
+	ibm_get_xive = rtas_function_token(RTAS_FN_IBM_GET_XIVE);
+	ibm_set_xive = rtas_function_token(RTAS_FN_IBM_SET_XIVE);
+	ibm_int_on  = rtas_function_token(RTAS_FN_IBM_INT_ON);
+	ibm_int_off = rtas_function_token(RTAS_FN_IBM_INT_OFF);
 
 	/* We enable the RTAS "ICS" if RTAS is present with the
 	 * appropriate tokens
diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index 0da66bc4823d..73c620c2a3a1 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -76,9 +76,6 @@ static cpumask_t xmon_batch_cpus = CPU_MASK_NONE;
 #define xmon_owner 0
 #endif /* CONFIG_SMP */
 
-#ifdef CONFIG_PPC_PSERIES
-static int set_indicator_token = RTAS_UNKNOWN_SERVICE;
-#endif
 static unsigned long in_xmon __read_mostly = 0;
 static int xmon_on = IS_ENABLED(CONFIG_XMON_DEFAULT);
 static bool xmon_is_ro = IS_ENABLED(CONFIG_XMON_DEFAULT_RO_MODE);
@@ -398,6 +395,7 @@ static inline void disable_surveillance(void)
 #ifdef CONFIG_PPC_PSERIES
 	/* Since this can't be a module, args should end up below 4GB. */
 	static struct rtas_args args;
+	const s32 token = rtas_function_token(RTAS_FN_SET_INDICATOR);
 
 	/*
 	 * At this point we have got all the cpus we can into
@@ -406,10 +404,10 @@ static inline void disable_surveillance(void)
 	 * If we did try to take rtas.lock there would be a
 	 * real possibility of deadlock.
 	 */
-	if (set_indicator_token == RTAS_UNKNOWN_SERVICE)
+	if (token == RTAS_UNKNOWN_SERVICE)
 		return;
 
-	rtas_call_unlocked(&args, set_indicator_token, 3, 1, NULL,
+	rtas_call_unlocked(&args, token, 3, 1, NULL,
 			   SURVEILLANCE_TOKEN, 0, 0);
 
 #endif /* CONFIG_PPC_PSERIES */
@@ -3976,14 +3974,6 @@ static void xmon_init(int enable)
 		__debugger_iabr_match = xmon_iabr_match;
 		__debugger_break_match = xmon_break_match;
 		__debugger_fault_handler = xmon_fault_handler;
-
-#ifdef CONFIG_PPC_PSERIES
-		/*
-		 * Get the token here to avoid trying to get a lock
-		 * during the crash, causing a deadlock.
-		 */
-		set_indicator_token = rtas_token("set-indicator");
-#endif
 	} else {
 		__debugger = NULL;
 		__debugger_ipi = NULL;

-- 
2.39.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 01/19] powerpc/rtas: handle extended delays safely in early boot
  2023-02-06 18:54   ` Nathan Lynch via B4 Submission Endpoint
  (?)
@ 2023-02-08 11:20   ` Michael Ellerman
  2023-02-08 13:14     ` Nathan Lynch
  -1 siblings, 1 reply; 50+ messages in thread
From: Michael Ellerman @ 2023-02-08 11:20 UTC (permalink / raw)
  To: Nathan Lynch via B4 Submission Endpoint, Nicholas Piggin,
	Christophe Leroy, Kajol Jain, Laurent Dufour,
	Mahesh J Salgaonkar, Andrew Donnellan, Nick Child
  Cc: Nathan Lynch, linuxppc-dev

Nathan Lynch via B4 Submission Endpoint <devnull+nathanl.linux.ibm.com@kernel.org> writes:
> From: Nathan Lynch <nathanl@linux.ibm.com>
>
> Some code that runs early in boot calls RTAS functions that can return
> -2 or 990x statuses, which mean the caller should retry. An example is
> pSeries_cmo_feature_init(), which invokes ibm,get-system-parameter but
> treats these benign statuses as errors instead of retrying.
>
> pSeries_cmo_feature_init() and similar code should be made to retry
> until they succeed or receive a real error, using the usual pattern:
>
> 	do {
> 		rc = rtas_call(token, etc...);
> 	} while (rtas_busy_delay(rc));
>
> But rtas_busy_delay() will perform a timed sleep on any 990x
> status. This isn't safe so early in boot, before the CPU scheduler and
> timer subsystem have initialized.
>
> The -2 RTAS status is much more likely to occur during single-threaded
> boot than 990x in practice, at least on PowerVM. This is because -2
> usually means that RTAS made progress but exhausted its self-imposed
> timeslice, while 990x is associated with concurrent requests from the
> OS causing internal contention. Regardless, according to the language
> in PAPR, the OS should be prepared to handle either type of status at
> any time.
>
> Add a fallback path to rtas_busy_delay() to handle this as safely as
> possible, performing a small delay on 990x. Include a counter to
> detect retry loops that aren't making progress and bail out.
>
> This was found by inspection and I'm not aware of any real
> failures. However, the implementation of rtas_busy_delay() before
> commit 38f7b7067dae ("powerpc/rtas: rtas_busy_delay() improvements")
> was not susceptible to this problem, so let's treat this as a
> regression.
>
> Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
> Fixes: 38f7b7067dae ("powerpc/rtas: rtas_busy_delay() improvements")
> ---
>  arch/powerpc/kernel/rtas.c | 48 +++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 47 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
> index 795225d7f138..ec2df09a70cf 100644
> --- a/arch/powerpc/kernel/rtas.c
> +++ b/arch/powerpc/kernel/rtas.c
> @@ -606,6 +606,46 @@ unsigned int rtas_busy_delay_time(int status)
>  	return ms;
>  }
>  
> +/*
> + * Early boot fallback for rtas_busy_delay().
> + */
> +static bool __init rtas_busy_delay_early(int status)
> +{
> +	static size_t successive_ext_delays __initdata;
> +	bool ret;

I think the logic would be easier to read if this was called "wait", but
maybe that's just me.

> +	switch (status) {
> +	case RTAS_EXTENDED_DELAY_MIN...RTAS_EXTENDED_DELAY_MAX:
> +		/*
> +		 * In the unlikely case that we receive an extended
> +		 * delay status in early boot, the OS is probably not
> +		 * the cause, and there's nothing we can do to clear
> +		 * the condition. Best we can do is delay for a bit
> +		 * and hope it's transient. Lie to the caller if it
> +		 * seems like we're stuck in a retry loop.
> +		 */
> +		mdelay(1);
> +		ret = true;
> +		successive_ext_delays += 1;
> +		if (successive_ext_delays > 1000) {
> +			pr_err("too many extended delays, giving up\n");
> +			dump_stack();
> +			ret = false;

Shouldn't we zero successive_ext_delays here?

Otherwise a subsequent (possibly different) RTAS call will immediately
fail out here if it gets a single extended delay from RTAS, won't it?

> +		}
> +		break;
> +	case RTAS_BUSY:
> +		ret = true;
> +		successive_ext_delays = 0;
> +		break;
> +	default:
> +		ret = false;
> +		successive_ext_delays = 0;
> +		break;
> +	}
> +
> +	return ret;
> +}
> +
>  /**
>   * rtas_busy_delay() - helper for RTAS busy and extended delay statuses
>   *
> @@ -624,11 +664,17 @@ unsigned int rtas_busy_delay_time(int status)
>   * * false - @status is not @RTAS_BUSY nor an extended delay hint. The
>   *           caller is responsible for handling @status.
>   */
> -bool rtas_busy_delay(int status)
> +bool __ref rtas_busy_delay(int status)

Can you explain the __ref in the change log.

>  {
>  	unsigned int ms;
>  	bool ret;
>  
> +	/*
> +	 * Can't do timed sleeps before timekeeping is up.
> +	 */
> +	if (system_state < SYSTEM_SCHEDULING)
> +		return rtas_busy_delay_early(status);
> +
>  	switch (status) {
>  	case RTAS_EXTENDED_DELAY_MIN...RTAS_EXTENDED_DELAY_MAX:
>  		ret = true;
>

cheers

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 07/19] powerpc/rtas: improve function information lookups
  2023-02-06 18:54   ` Nathan Lynch
  (?)
@ 2023-02-08 11:57   ` Michael Ellerman
  2023-02-08 13:16     ` Nathan Lynch
  -1 siblings, 1 reply; 50+ messages in thread
From: Michael Ellerman @ 2023-02-08 11:57 UTC (permalink / raw)
  To: Nathan Lynch via B4 Submission Endpoint, Nicholas Piggin,
	Christophe Leroy, Kajol Jain, Laurent Dufour,
	Mahesh J Salgaonkar, Andrew Donnellan, Nick Child
  Cc: Nathan Lynch, linuxppc-dev

Nathan Lynch via B4 Submission Endpoint
<devnull+nathanl.linux.ibm.com@kernel.org> writes:
> From: Nathan Lynch <nathanl@linux.ibm.com>
>
> The core RTAS support code and its clients perform two types of lookup
> for RTAS firmware function information.
> 
...
> diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
> index 479a95cb2770..14fe79217c26 100644
> --- a/arch/powerpc/include/asm/rtas.h
> +++ b/arch/powerpc/include/asm/rtas.h
> @@ -16,6 +16,93 @@
>   * Copyright (C) 2001 PPC 64 Team, IBM Corp
>   */
>  
> +#define rtas_fnidx(x_) RTAS_FNIDX__ ## x_

I'd prefer we just spelt it out in full, to aid grepability and
cscope/tags etc.

> +enum rtas_function_index {
> +	rtas_fnidx(CHECK_EXCEPTION),
> +	rtas_fnidx(DISPLAY_CHARACTER),
> +	rtas_fnidx(EVENT_SCAN),
> +	rtas_fnidx(FREEZE_TIME_BASE),
> +	rtas_fnidx(GET_POWER_LEVEL),
> +	rtas_fnidx(GET_SENSOR_STATE),
> +	rtas_fnidx(GET_TERM_CHAR),
> +	rtas_fnidx(GET_TIME_OF_DAY),
> +	rtas_fnidx(IBM_ACTIVATE_FIRMWARE),
> +	rtas_fnidx(IBM_CBE_START_PTCAL),
> +	rtas_fnidx(IBM_CBE_STOP_PTCAL),
> +	rtas_fnidx(IBM_CHANGE_MSI),



cheers

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 11/19] powerpc/rtas: add work area allocator
  2023-02-06 18:54   ` Nathan Lynch via B4 Submission Endpoint
  (?)
@ 2023-02-08 11:58   ` Michael Ellerman
  2023-02-08 14:48     ` Nathan Lynch
  -1 siblings, 1 reply; 50+ messages in thread
From: Michael Ellerman @ 2023-02-08 11:58 UTC (permalink / raw)
  To: Nathan Lynch via B4 Submission Endpoint, Nicholas Piggin,
	Christophe Leroy, Kajol Jain, Laurent Dufour,
	Mahesh J Salgaonkar, Andrew Donnellan, Nick Child
  Cc: Nathan Lynch, linuxppc-dev

Nathan Lynch via B4 Submission Endpoint
<devnull+nathanl.linux.ibm.com@kernel.org> writes:
> diff --git a/arch/powerpc/include/asm/rtas-work-area.h b/arch/powerpc/include/asm/rtas-work-area.h
> new file mode 100644
> index 000000000000..76ccb039cc37
> --- /dev/null
> +++ b/arch/powerpc/include/asm/rtas-work-area.h
> @@ -0,0 +1,45 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +#ifndef POWERPC_RTAS_WORK_AREA_H
> +#define POWERPC_RTAS_WORK_AREA_H

The usual style would be _ASM_POWERPC_RTAS_WORK_AREA_H.

> diff --git a/arch/powerpc/kernel/rtas-work-area.c b/arch/powerpc/kernel/rtas-work-area.c
> new file mode 100644
> index 000000000000..75950e13a0fe
> --- /dev/null
> +++ b/arch/powerpc/kernel/rtas-work-area.c
> @@ -0,0 +1,208 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +
> +#define pr_fmt(fmt)	"rtas-work-area: " fmt
> +
> +#include <linux/genalloc.h>
> +#include <linux/log2.h>
> +#include <linux/kernel.h>
> +#include <linux/memblock.h>
> +#include <linux/mempool.h>
> +#include <linux/minmax.h>
> +#include <linux/mutex.h>
> +#include <linux/numa.h>
> +#include <linux/sizes.h>
> +#include <linux/wait.h>
> +
> +#include <asm/machdep.h>
> +#include <asm/rtas-work-area.h>
> +
> +enum {
> +	/*
> +	 * Ensure the pool is page-aligned.
> +	 */
> +	RTAS_WORK_AREA_ARENA_ALIGN = PAGE_SIZE,
> +
> +	RTAS_WORK_AREA_ARENA_SZ = SZ_256K,
> +	/*
> +	 * The smallest known work area size is for ibm,get-vpd's
> +	 * location code argument, which is limited to 79 characters
> +	 * plus 1 nul terminator.
> +	 *
> +	 * PAPR+ 7.3.20 ibm,get-vpd RTAS Call
> +	 * PAPR+ 12.3.2.4 Converged Location Code Rules - Length Restrictions
> +	 */
> +	RTAS_WORK_AREA_MIN_ALLOC_SZ = roundup_pow_of_two(80),
> +	/*
> +	 * Don't let a single allocation claim the whole arena.
> +	 */
> +	RTAS_WORK_AREA_MAX_ALLOC_SZ = RTAS_WORK_AREA_ARENA_SZ / 2,
> +};
> +
> +static struct rtas_work_area_allocator_state {
> +	struct gen_pool *gen_pool;
> +	char *arena;
> +	struct mutex mutex; /* serializes allocations */
> +	struct wait_queue_head wqh;
> +	mempool_t descriptor_pool;
> +	bool available;
> +} rwa_state_ = {
> +	.mutex = __MUTEX_INITIALIZER(rwa_state_.mutex),
> +	.wqh = __WAIT_QUEUE_HEAD_INITIALIZER(rwa_state_.wqh),
> +};
> +static struct rtas_work_area_allocator_state *rwa_state = &rwa_state_;

I assumed the pointer was so you could swap this out at runtime or
something, but I don't think you do.

Any reason not to drop the pointer and just use rwa_state.foo accessors?
That would also allow the struct to be anonymous.

Or if you have the pointer you can at least make it NULL prior to init
and avoid the need for "available".

> +/*
> + * A single work area buffer and descriptor to serve requests early in
> + * boot before the allocator is fully initialized.
> + */
> +static bool early_work_area_in_use __initdata;
> +static char early_work_area_buf[SZ_4K] __initdata;

That should be page aligned I think?


> +static struct rtas_work_area early_work_area __initdata = {
> +	.buf = early_work_area_buf,
> +	.size = sizeof(early_work_area_buf),
> +};
> +
> +
> +static struct rtas_work_area * __init rtas_work_area_alloc_early(size_t size)
> +{
> +	WARN_ON(size > early_work_area.size);
> +	WARN_ON(early_work_area_in_use);
> +	early_work_area_in_use = true;
> +	memset(early_work_area.buf, 0, early_work_area.size);
> +	return &early_work_area;
> +}
> +
> +static void __init rtas_work_area_free_early(struct rtas_work_area *work_area)
> +{
> +	WARN_ON(work_area != &early_work_area);
> +	WARN_ON(!early_work_area_in_use);
> +	early_work_area_in_use = false;
> +}
> +
> +struct rtas_work_area * __ref rtas_work_area_alloc(size_t size)
> +{
> +	struct rtas_work_area *area;
> +	unsigned long addr;
> +
> +	might_sleep();
> +
> +	WARN_ON(size > RTAS_WORK_AREA_MAX_ALLOC_SZ);
> +	size = min_t(size_t, size, RTAS_WORK_AREA_MAX_ALLOC_SZ);

This seems unsafe.

If you return a buffer smaller than the caller asks for they're likely
to read/write past the end of it and corrupt memory.

AFAIK genalloc doesn't have guard pages or anything fancy to save us
from that - but maybe I'm wrong, I've never used it.

There's only three callers in the end, seems like we should just return
NULL if the size is too large and have callers check the return value.

> +	if (!rwa_state->available) {
> +		area = rtas_work_area_alloc_early(size);
> +		goto out;
> +	}
> +
> +	/*
> +	 * To ensure FCFS behavior and prevent a high rate of smaller
> +	 * requests from starving larger ones, use the mutex to queue
> +	 * allocations.
> +	 */
> +	mutex_lock(&rwa_state->mutex);
> +	wait_event(rwa_state->wqh,
> +		   (addr = gen_pool_alloc(rwa_state->gen_pool, size)) != 0);
> +	mutex_unlock(&rwa_state->mutex);
> +
> +	area = mempool_alloc(&rwa_state->descriptor_pool, GFP_KERNEL);
> +	*area = (typeof(*area)){
> +		.size = size,
> +		.buf = (char *)addr,
> +	};

That is an odd way to write that :)

> +out:
> +	pr_devel("%ps -> %s() -> buf=%p size=%zu\n",
> +		 (void *)_RET_IP_, __func__, area->buf, area->size);

Can we drop those? They need a recompile to enable, so if someone needs
debugging they can just rewrite them - or use some sort of tracing instead.

> +	return area;
> +}
> +
> +void __ref rtas_work_area_free(struct rtas_work_area *area)
> +{
> +	pr_devel("%ps -> %s() -> buf=%p size=%zu\n",
> +		 (void *)_RET_IP_, __func__, area->buf, area->size);

Ditto.
 
> +	if (!rwa_state->available) {
> +		rtas_work_area_free_early(area);
> +		return;
> +	}
> +
> +	gen_pool_free(rwa_state->gen_pool, (unsigned long)area->buf, area->size);
> +	mempool_free(area, &rwa_state->descriptor_pool);
> +	wake_up(&rwa_state->wqh);
> +}
> +
> +/*
> + * Initialization of the work area allocator happens in two parts. To
> + * reliably reserve an arena that satisfies RTAS addressing
> + * requirements, we must perform a memblock allocation early,
> + * immmediately after RTAS instantiation. Then we have to wait until
> + * the slab allocator is up before setting up the descriptor mempool
> + * and adding the arena to a gen_pool.
> + */
> +static __init int rtas_work_area_allocator_init(void)
> +{
> +	const unsigned int order = ilog2(RTAS_WORK_AREA_MIN_ALLOC_SZ);
> +	const phys_addr_t pa_start = __pa(rwa_state->arena);
> +	const phys_addr_t pa_end = pa_start + RTAS_WORK_AREA_ARENA_SZ - 1;
> +	struct gen_pool *pool;
> +	const int nid = NUMA_NO_NODE;
> +	int err;
> +
> +	err = -ENOMEM;
> +	if (!rwa_state->arena)
> +		goto err_out;
> +
> +	pool = gen_pool_create(order, nid);
> +	if (!pool)
> +		goto err_out;
> +	/*
> +	 * All RTAS functions that consume work areas are OK with
> +	 * natural alignment, when they have alignment requirements at
> +	 * all.
> +	 */
> +	gen_pool_set_algo(pool, gen_pool_first_fit_order_align, NULL);
> +
> +	err = gen_pool_add(pool, (unsigned long)rwa_state->arena,
> +			   RTAS_WORK_AREA_ARENA_SZ, nid);
> +	if (err)
> +		goto err_destroy;
> +
> +	err = mempool_init_kmalloc_pool(&rwa_state->descriptor_pool, 1,
> +					sizeof(struct rtas_work_area));
> +	if (err)
> +		goto err_destroy;
> +
> +	rwa_state->gen_pool = pool;
> +	rwa_state->available = true;
> +
> +	pr_debug("arena [%pa-%pa] (%uK), min/max alloc sizes %u/%u\n",
> +		 &pa_start, &pa_end,
> +		 RTAS_WORK_AREA_ARENA_SZ / SZ_1K,
> +		 RTAS_WORK_AREA_MIN_ALLOC_SZ,
> +		 RTAS_WORK_AREA_MAX_ALLOC_SZ);
> +
> +	return 0;
> +
> +err_destroy:
> +	gen_pool_destroy(pool);
> +err_out:
> +	return err;
> +}
> +machine_arch_initcall(pseries, rtas_work_area_allocator_init);

Should it live in platforms/pseries then?

> +/**
> + * rtas_work_area_reserve_arena() - reserve memory suitable for RTAS work areas.
> + */
> +int __init rtas_work_area_reserve_arena(const phys_addr_t limit)
> +{
> +	const phys_addr_t align = RTAS_WORK_AREA_ARENA_ALIGN;
> +	const phys_addr_t size = RTAS_WORK_AREA_ARENA_SZ;
> +	const phys_addr_t min = MEMBLOCK_LOW_LIMIT;
> +	const int nid = NUMA_NO_NODE;

This should probably also be restricted to pseries?


cheers

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 18/19] powerpc/rtas: introduce rtas_function_token() API
  2023-02-06 18:54   ` Nathan Lynch via B4 Submission Endpoint
  (?)
@ 2023-02-08 12:09   ` Michael Ellerman
  2023-02-08 15:44     ` Nathan Lynch
  -1 siblings, 1 reply; 50+ messages in thread
From: Michael Ellerman @ 2023-02-08 12:09 UTC (permalink / raw)
  To: Nathan Lynch via B4 Submission Endpoint, Nicholas Piggin,
	Christophe Leroy, Kajol Jain, Laurent Dufour,
	Mahesh J Salgaonkar, Andrew Donnellan, Nick Child
  Cc: Nathan Lynch, linuxppc-dev

Nathan Lynch via B4 Submission Endpoint
<devnull+nathanl.linux.ibm.com@kernel.org> writes:
> From: Nathan Lynch <nathanl@linux.ibm.com>
>
> Users of rtas_token() supply a string argument that can't be validated
> at build time. A typo or misspelling has to be caught by inspection or
> by observing wrong behavior at runtime.
>
> Since the core RTAS code now has consolidated the names of all
> possible RTAS functions and mapped them to their tokens, token lookup
> can be implemented using symbolic constants to index a static array.
>
> So introduce rtas_function_token(), a replacement API which does that,
> along with a rtas_service_present()-equivalent helper,
> rtas_function_implemented(). Callers supply an opaque predefined
> function handle which is used internally to index the function
> table. Typos or other inappropriate arguments yield build errors, and
> the function handle is a type that can't be easily confused with RTAS
> tokens or other integer types.

Why not go all the way and have the rtas_call() signature be:

  int rtas_call(rtas_fn_handle_t fn, int, int, int *, ...);


And have it do the token lookup internally? That way a caller can never
inadvertantly pass a random integer to rtas_call().

And instead of eg:

	error = rtas_call(rtas_function_token(RTAS_FN_GET_TIME_OF_DAY), 0, 8, ret);

we'd just need:

	error = rtas_call(RTAS_FN_GET_TIME_OF_DAY, 0, 8, ret);


Doing the conversion all at once might be tricky. So maybe we need to add
rtas_fn_call() which takes rtas_fn_handle_t so we can convert cases individually?

Anyway just a thought. I guess we could merge this as-is and then do a
further change to use rtas_fn_handle_t later.

> diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
> index 14fe79217c26..fe400438c1fb 100644
> --- a/arch/powerpc/include/asm/rtas.h
> +++ b/arch/powerpc/include/asm/rtas.h
> @@ -103,6 +103,99 @@ enum rtas_function_index {
>  	rtas_fnidx(WRITE_PCI_CONFIG),
>  };
>  
> +/*
> + * Opaque handle for client code to refer to RTAS functions. All valid
> + * function handles are build-time constants prefixed with RTAS_FN_.
> + */
> +typedef struct {
> +	const enum rtas_function_index index;
> +} rtas_fn_handle_t;
> +
> +#define rtas_fn_handle(x_) ((const rtas_fn_handle_t) { .index = rtas_fnidx(x_), })
> +
> +#define RTAS_FN_CHECK_EXCEPTION                   rtas_fn_handle(CHECK_EXCEPTION)
> +#define RTAS_FN_DISPLAY_CHARACTER                 rtas_fn_handle(DISPLAY_CHARACTER)
> +#define RTAS_FN_EVENT_SCAN                        rtas_fn_handle(EVENT_SCAN)
> +#define RTAS_FN_FREEZE_TIME_BASE                  rtas_fn_handle(FREEZE_TIME_BASE)
> +#define RTAS_FN_GET_POWER_LEVEL                   rtas_fn_handle(GET_POWER_LEVEL)
> +#define RTAS_FN_GET_SENSOR_STATE                  rtas_fn_handle(GET_SENSOR_STATE)
> +#define RTAS_FN_GET_TERM_CHAR                     rtas_fn_handle(GET_TERM_CHAR)
> +#define RTAS_FN_GET_TIME_OF_DAY                   rtas_fn_handle(GET_TIME_OF_DAY)
> +#define RTAS_FN_IBM_ACTIVATE_FIRMWARE             rtas_fn_handle(IBM_ACTIVATE_FIRMWARE)
> +#define RTAS_FN_IBM_CBE_START_PTCAL               rtas_fn_handle(IBM_CBE_START_PTCAL)
> +#define RTAS_FN_IBM_CBE_STOP_PTCAL                rtas_fn_handle(IBM_CBE_STOP_PTCAL)
> +#define RTAS_FN_IBM_CHANGE_MSI                    rtas_fn_handle(IBM_CHANGE_MSI)
> +#define RTAS_FN_IBM_CLOSE_ERRINJCT                rtas_fn_handle(IBM_CLOSE_ERRINJCT)
> +#define RTAS_FN_IBM_CONFIGURE_BRIDGE              rtas_fn_handle(IBM_CONFIGURE_BRIDGE)
> +#define RTAS_FN_IBM_CONFIGURE_CONNECTOR           rtas_fn_handle(IBM_CONFIGURE_CONNECTOR)
> +#define RTAS_FN_IBM_CONFIGURE_KERNEL_DUMP         rtas_fn_handle(IBM_CONFIGURE_KERNEL_DUMP)
> +#define RTAS_FN_IBM_CONFIGURE_PE                  rtas_fn_handle(IBM_CONFIGURE_PE)
> +#define RTAS_FN_IBM_CREATE_PE_DMA_WINDOW          rtas_fn_handle(IBM_CREATE_PE_DMA_WINDOW)
> +#define RTAS_FN_IBM_DISPLAY_MESSAGE               rtas_fn_handle(IBM_DISPLAY_MESSAGE)
> +#define RTAS_FN_IBM_ERRINJCT                      rtas_fn_handle(IBM_ERRINJCT)
> +#define RTAS_FN_IBM_EXTI2C                        rtas_fn_handle(IBM_EXTI2C)
> +#define RTAS_FN_IBM_GET_CONFIG_ADDR_INFO          rtas_fn_handle(IBM_GET_CONFIG_ADDR_INFO)
> +#define RTAS_FN_IBM_GET_CONFIG_ADDR_INFO2         rtas_fn_handle(IBM_GET_CONFIG_ADDR_INFO2)
> +#define RTAS_FN_IBM_GET_DYNAMIC_SENSOR_STATE      rtas_fn_handle(IBM_GET_DYNAMIC_SENSOR_STATE)
> +#define RTAS_FN_IBM_GET_INDICES                   rtas_fn_handle(IBM_GET_INDICES)
> +#define RTAS_FN_IBM_GET_RIO_TOPOLOGY              rtas_fn_handle(IBM_GET_RIO_TOPOLOGY)
> +#define RTAS_FN_IBM_GET_SYSTEM_PARAMETER          rtas_fn_handle(IBM_GET_SYSTEM_PARAMETER)
> +#define RTAS_FN_IBM_GET_VPD                       rtas_fn_handle(IBM_GET_VPD)
> +#define RTAS_FN_IBM_GET_XIVE                      rtas_fn_handle(IBM_GET_XIVE)
> +#define RTAS_FN_IBM_INT_OFF                       rtas_fn_handle(IBM_INT_OFF)
> +#define RTAS_FN_IBM_INT_ON                        rtas_fn_handle(IBM_INT_ON)
> +#define RTAS_FN_IBM_IO_QUIESCE_ACK                rtas_fn_handle(IBM_IO_QUIESCE_ACK)
> +#define RTAS_FN_IBM_LPAR_PERFTOOLS                rtas_fn_handle(IBM_LPAR_PERFTOOLS)
> +#define RTAS_FN_IBM_MANAGE_FLASH_IMAGE            rtas_fn_handle(IBM_MANAGE_FLASH_IMAGE)
> +#define RTAS_FN_IBM_MANAGE_STORAGE_PRESERVATION   rtas_fn_handle(IBM_MANAGE_STORAGE_PRESERVATION)
> +#define RTAS_FN_IBM_NMI_INTERLOCK                 rtas_fn_handle(IBM_NMI_INTERLOCK)
> +#define RTAS_FN_IBM_NMI_REGISTER                  rtas_fn_handle(IBM_NMI_REGISTER)
> +#define RTAS_FN_IBM_OPEN_ERRINJCT                 rtas_fn_handle(IBM_OPEN_ERRINJCT)
> +#define RTAS_FN_IBM_OPEN_SRIOV_ALLOW_UNFREEZE     rtas_fn_handle(IBM_OPEN_SRIOV_ALLOW_UNFREEZE)
> +#define RTAS_FN_IBM_OPEN_SRIOV_MAP_PE_NUMBER      rtas_fn_handle(IBM_OPEN_SRIOV_MAP_PE_NUMBER)
> +#define RTAS_FN_IBM_OS_TERM                       rtas_fn_handle(IBM_OS_TERM)
> +#define RTAS_FN_IBM_PARTNER_CONTROL               rtas_fn_handle(IBM_PARTNER_CONTROL)
> +#define RTAS_FN_IBM_PHYSICAL_ATTESTATION          rtas_fn_handle(IBM_PHYSICAL_ATTESTATION)
> +#define RTAS_FN_IBM_PLATFORM_DUMP                 rtas_fn_handle(IBM_PLATFORM_DUMP)
> +#define RTAS_FN_IBM_POWER_OFF_UPS                 rtas_fn_handle(IBM_POWER_OFF_UPS)
> +#define RTAS_FN_IBM_QUERY_INTERRUPT_SOURCE_NUMBER rtas_fn_handle(IBM_QUERY_INTERRUPT_SOURCE_NUMBER)
> +#define RTAS_FN_IBM_QUERY_PE_DMA_WINDOW           rtas_fn_handle(IBM_QUERY_PE_DMA_WINDOW)
> +#define RTAS_FN_IBM_READ_PCI_CONFIG               rtas_fn_handle(IBM_READ_PCI_CONFIG)
> +#define RTAS_FN_IBM_READ_SLOT_RESET_STATE         rtas_fn_handle(IBM_READ_SLOT_RESET_STATE)
> +#define RTAS_FN_IBM_READ_SLOT_RESET_STATE2        rtas_fn_handle(IBM_READ_SLOT_RESET_STATE2)
> +#define RTAS_FN_IBM_REMOVE_PE_DMA_WINDOW          rtas_fn_handle(IBM_REMOVE_PE_DMA_WINDOW)
> +#define RTAS_FN_IBM_RESET_PE_DMA_WINDOWS          rtas_fn_handle(IBM_RESET_PE_DMA_WINDOWS)
> +#define RTAS_FN_IBM_SCAN_LOG_DUMP                 rtas_fn_handle(IBM_SCAN_LOG_DUMP)
> +#define RTAS_FN_IBM_SET_DYNAMIC_INDICATOR         rtas_fn_handle(IBM_SET_DYNAMIC_INDICATOR)
> +#define RTAS_FN_IBM_SET_EEH_OPTION                rtas_fn_handle(IBM_SET_EEH_OPTION)
> +#define RTAS_FN_IBM_SET_SLOT_RESET                rtas_fn_handle(IBM_SET_SLOT_RESET)
> +#define RTAS_FN_IBM_SET_SYSTEM_PARAMETER          rtas_fn_handle(IBM_SET_SYSTEM_PARAMETER)
> +#define RTAS_FN_IBM_SET_XIVE                      rtas_fn_handle(IBM_SET_XIVE)
> +#define RTAS_FN_IBM_SLOT_ERROR_DETAIL             rtas_fn_handle(IBM_SLOT_ERROR_DETAIL)
> +#define RTAS_FN_IBM_SUSPEND_ME                    rtas_fn_handle(IBM_SUSPEND_ME)
> +#define RTAS_FN_IBM_TUNE_DMA_PARMS                rtas_fn_handle(IBM_TUNE_DMA_PARMS)
> +#define RTAS_FN_IBM_UPDATE_FLASH_64_AND_REBOOT    rtas_fn_handle(IBM_UPDATE_FLASH_64_AND_REBOOT)
> +#define RTAS_FN_IBM_UPDATE_NODES                  rtas_fn_handle(IBM_UPDATE_NODES)
> +#define RTAS_FN_IBM_UPDATE_PROPERTIES             rtas_fn_handle(IBM_UPDATE_PROPERTIES)
> +#define RTAS_FN_IBM_VALIDATE_FLASH_IMAGE          rtas_fn_handle(IBM_VALIDATE_FLASH_IMAGE)
> +#define RTAS_FN_IBM_WRITE_PCI_CONFIG              rtas_fn_handle(IBM_WRITE_PCI_CONFIG)
> +#define RTAS_FN_NVRAM_FETCH                       rtas_fn_handle(NVRAM_FETCH)
> +#define RTAS_FN_NVRAM_STORE                       rtas_fn_handle(NVRAM_STORE)
> +#define RTAS_FN_POWER_OFF                         rtas_fn_handle(POWER_OFF)
> +#define RTAS_FN_PUT_TERM_CHAR                     rtas_fn_handle(PUT_TERM_CHAR)
> +#define RTAS_FN_QUERY_CPU_STOPPED_STATE           rtas_fn_handle(QUERY_CPU_STOPPED_STATE)
> +#define RTAS_FN_READ_PCI_CONFIG                   rtas_fn_handle(READ_PCI_CONFIG)
> +#define RTAS_FN_RTAS_LAST_ERROR                   rtas_fn_handle(RTAS_LAST_ERROR)
> +#define RTAS_FN_SET_INDICATOR                     rtas_fn_handle(SET_INDICATOR)
> +#define RTAS_FN_SET_POWER_LEVEL                   rtas_fn_handle(SET_POWER_LEVEL)
> +#define RTAS_FN_SET_TIME_FOR_POWER_ON             rtas_fn_handle(SET_TIME_FOR_POWER_ON)
> +#define RTAS_FN_SET_TIME_OF_DAY                   rtas_fn_handle(SET_TIME_OF_DAY)
> +#define RTAS_FN_START_CPU                         rtas_fn_handle(START_CPU)
> +#define RTAS_FN_STOP_SELF                         rtas_fn_handle(STOP_SELF)
> +#define RTAS_FN_SYSTEM_REBOOT                     rtas_fn_handle(SYSTEM_REBOOT)
> +#define RTAS_FN_THAW_TIME_BASE                    rtas_fn_handle(THAW_TIME_BASE)
> +#define RTAS_FN_WRITE_PCI_CONFIG                  rtas_fn_handle(WRITE_PCI_CONFIG)
> +
>  #define RTAS_UNKNOWN_SERVICE (-1)
>  #define RTAS_INSTANTIATE_MAX (1ULL<<30) /* Don't instantiate rtas at/above this value */
>  
> @@ -309,6 +402,11 @@ extern void (*rtas_flash_term_hook)(int);
>  
>  extern struct rtas_t rtas;
>  
> +s32 rtas_function_token(const rtas_fn_handle_t handle);
> +static inline bool rtas_function_implemented(const rtas_fn_handle_t handle)
> +{
> +	return rtas_function_token(handle) != RTAS_UNKNOWN_SERVICE;
> +}
>  extern int rtas_token(const char *service);
>  extern int rtas_service_present(const char *service);
>  extern int rtas_call(int token, int, int, int *, ...);
> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
> index 41c430dc40c2..17e59306ce63 100644
> --- a/arch/powerpc/kernel/rtas.c
> +++ b/arch/powerpc/kernel/rtas.c
> @@ -453,6 +453,26 @@ static struct rtas_function rtas_function_table[] __ro_after_init = {
>  	},
>  };
>  
> +/**
> + * rtas_function_token() - RTAS function token lookup.
> + * @handle: Function handle, e.g. RTAS_FN_EVENT_SCAN.
> + *
> + * Context: Any context.
> + * Return: the token value for the function if implemented by this platform,
> + *         otherwise RTAS_UNKNOWN_SERVICE.
> + */
> +s32 rtas_function_token(const rtas_fn_handle_t handle)
> +{
> +	const size_t index = handle.index;
> +	const bool out_of_bounds = index >= ARRAY_SIZE(rtas_function_table);
> +
> +	if (WARN_ONCE(out_of_bounds, "invalid function index %zu", index))
> +		return RTAS_UNKNOWN_SERVICE;

This needs:

+	// If RTAS is not present or not initialised (yet) return unknown
+	if (!rtas.dev)
+		return RTAS_UNKNOWN_SERVICE;
+

Otherwise powernv breaks because it looks up tokens and gets back 0,
because we never got as far as rtas_function_table_init() (to set all the
tokens to RTAS_UNKNOWN_SERVICE), because we bailed out at the start of
rtas_initialize() when we found no /rtas node.

> +	return rtas_function_table[index].token;
> +}
> +EXPORT_SYMBOL_GPL(rtas_function_token);
> +
>  static int rtas_function_cmp(const void *a, const void *b)
>  {
>  	const struct rtas_function *f1 = a;


cheers

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 01/19] powerpc/rtas: handle extended delays safely in early boot
  2023-02-08 11:20   ` Michael Ellerman
@ 2023-02-08 13:14     ` Nathan Lynch
  2023-02-10  5:54       ` Michael Ellerman
  0 siblings, 1 reply; 50+ messages in thread
From: Nathan Lynch @ 2023-02-08 13:14 UTC (permalink / raw)
  To: Michael Ellerman, Nathan Lynch via B4 Submission Endpoint,
	Nicholas Piggin, Christophe Leroy, Kajol Jain, Laurent Dufour,
	Mahesh J Salgaonkar, Andrew Donnellan, Nick Child
  Cc: linuxppc-dev

Michael Ellerman <mpe@ellerman.id.au> writes:
> Nathan Lynch via B4 Submission Endpoint <devnull+nathanl.linux.ibm.com@kernel.org> writes:
>> From: Nathan Lynch <nathanl@linux.ibm.com>
>>
>> Some code that runs early in boot calls RTAS functions that can return
>> -2 or 990x statuses, which mean the caller should retry. An example is
>> pSeries_cmo_feature_init(), which invokes ibm,get-system-parameter but
>> treats these benign statuses as errors instead of retrying.
>>
>> pSeries_cmo_feature_init() and similar code should be made to retry
>> until they succeed or receive a real error, using the usual pattern:
>>
>> 	do {
>> 		rc = rtas_call(token, etc...);
>> 	} while (rtas_busy_delay(rc));
>>
>> But rtas_busy_delay() will perform a timed sleep on any 990x
>> status. This isn't safe so early in boot, before the CPU scheduler and
>> timer subsystem have initialized.
>>
>> The -2 RTAS status is much more likely to occur during single-threaded
>> boot than 990x in practice, at least on PowerVM. This is because -2
>> usually means that RTAS made progress but exhausted its self-imposed
>> timeslice, while 990x is associated with concurrent requests from the
>> OS causing internal contention. Regardless, according to the language
>> in PAPR, the OS should be prepared to handle either type of status at
>> any time.
>>
>> Add a fallback path to rtas_busy_delay() to handle this as safely as
>> possible, performing a small delay on 990x. Include a counter to
>> detect retry loops that aren't making progress and bail out.
>>
>> This was found by inspection and I'm not aware of any real
>> failures. However, the implementation of rtas_busy_delay() before
>> commit 38f7b7067dae ("powerpc/rtas: rtas_busy_delay() improvements")
>> was not susceptible to this problem, so let's treat this as a
>> regression.
>>
>> Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
>> Fixes: 38f7b7067dae ("powerpc/rtas: rtas_busy_delay() improvements")
>> ---
>>  arch/powerpc/kernel/rtas.c | 48 +++++++++++++++++++++++++++++++++++++++++++++-
>>  1 file changed, 47 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
>> index 795225d7f138..ec2df09a70cf 100644
>> --- a/arch/powerpc/kernel/rtas.c
>> +++ b/arch/powerpc/kernel/rtas.c
>> @@ -606,6 +606,46 @@ unsigned int rtas_busy_delay_time(int status)
>>  	return ms;
>>  }
>>  
>> +/*
>> + * Early boot fallback for rtas_busy_delay().
>> + */
>> +static bool __init rtas_busy_delay_early(int status)
>> +{
>> +	static size_t successive_ext_delays __initdata;
>> +	bool ret;
>
> I think the logic would be easier to read if this was called "wait", but
> maybe that's just me.

Maybe "retry"? That communicates what the function is telling callers to do.

>
>> +	switch (status) {
>> +	case RTAS_EXTENDED_DELAY_MIN...RTAS_EXTENDED_DELAY_MAX:
>> +		/*
>> +		 * In the unlikely case that we receive an extended
>> +		 * delay status in early boot, the OS is probably not
>> +		 * the cause, and there's nothing we can do to clear
>> +		 * the condition. Best we can do is delay for a bit
>> +		 * and hope it's transient. Lie to the caller if it
>> +		 * seems like we're stuck in a retry loop.
>> +		 */
>> +		mdelay(1);
>> +		ret = true;
>> +		successive_ext_delays += 1;
>> +		if (successive_ext_delays > 1000) {
>> +			pr_err("too many extended delays, giving up\n");
>> +			dump_stack();
>> +			ret = false;
>
> Shouldn't we zero successive_ext_delays here?
>
> Otherwise a subsequent (possibly different) RTAS call will immediately
> fail out here if it gets a single extended delay from RTAS, won't it?

Yes, will fix. Thanks.

>
>> +		}
>> +		break;
>> +	case RTAS_BUSY:
>> +		ret = true;
>> +		successive_ext_delays = 0;
>> +		break;
>> +	default:
>> +		ret = false;
>> +		successive_ext_delays = 0;
>> +		break;
>> +	}
>> +
>> +	return ret;
>> +}
>> +
>>  /**
>>   * rtas_busy_delay() - helper for RTAS busy and extended delay statuses
>>   *
>> @@ -624,11 +664,17 @@ unsigned int rtas_busy_delay_time(int status)
>>   * * false - @status is not @RTAS_BUSY nor an extended delay hint. The
>>   *           caller is responsible for handling @status.
>>   */
>> -bool rtas_busy_delay(int status)
>> +bool __ref rtas_busy_delay(int status)
>
> Can you explain the __ref in the change log.

Yes, will add that.


>>  {
>>  	unsigned int ms;
>>  	bool ret;
>>  
>> +	/*
>> +	 * Can't do timed sleeps before timekeeping is up.
>> +	 */
>> +	if (system_state < SYSTEM_SCHEDULING)
>> +		return rtas_busy_delay_early(status);
>> +
>>  	switch (status) {
>>  	case RTAS_EXTENDED_DELAY_MIN...RTAS_EXTENDED_DELAY_MAX:
>>  		ret = true;
>>
>
> cheers

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 07/19] powerpc/rtas: improve function information lookups
  2023-02-08 11:57   ` Michael Ellerman
@ 2023-02-08 13:16     ` Nathan Lynch
  0 siblings, 0 replies; 50+ messages in thread
From: Nathan Lynch @ 2023-02-08 13:16 UTC (permalink / raw)
  To: Michael Ellerman, Nathan Lynch via B4 Submission Endpoint,
	Nicholas Piggin, Christophe Leroy, Kajol Jain, Laurent Dufour,
	Mahesh J Salgaonkar, Andrew Donnellan, Nick Child
  Cc: linuxppc-dev

Michael Ellerman <mpe@ellerman.id.au> writes:
> Nathan Lynch via B4 Submission Endpoint
> <devnull+nathanl.linux.ibm.com@kernel.org> writes:
>> From: Nathan Lynch <nathanl@linux.ibm.com>
>>
>> The core RTAS support code and its clients perform two types of lookup
>> for RTAS firmware function information.
>> 
> ...
>> diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
>> index 479a95cb2770..14fe79217c26 100644
>> --- a/arch/powerpc/include/asm/rtas.h
>> +++ b/arch/powerpc/include/asm/rtas.h
>> @@ -16,6 +16,93 @@
>>   * Copyright (C) 2001 PPC 64 Team, IBM Corp
>>   */
>>  
>> +#define rtas_fnidx(x_) RTAS_FNIDX__ ## x_
>
> I'd prefer we just spelt it out in full, to aid grepability and
> cscope/tags etc.

OK.


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 11/19] powerpc/rtas: add work area allocator
  2023-02-08 11:58   ` Michael Ellerman
@ 2023-02-08 14:48     ` Nathan Lynch
  2023-02-10  6:07       ` Michael Ellerman
  0 siblings, 1 reply; 50+ messages in thread
From: Nathan Lynch @ 2023-02-08 14:48 UTC (permalink / raw)
  To: Michael Ellerman, Nathan Lynch via B4 Submission Endpoint,
	Nicholas Piggin, Christophe Leroy, Kajol Jain, Laurent Dufour,
	Mahesh J Salgaonkar, Andrew Donnellan, Nick Child
  Cc: linuxppc-dev

Michael Ellerman <mpe@ellerman.id.au> writes:
> Nathan Lynch via B4 Submission Endpoint
> <devnull+nathanl.linux.ibm.com@kernel.org> writes:
>> diff --git a/arch/powerpc/include/asm/rtas-work-area.h b/arch/powerpc/include/asm/rtas-work-area.h
>> new file mode 100644
>> index 000000000000..76ccb039cc37
>> --- /dev/null
>> +++ b/arch/powerpc/include/asm/rtas-work-area.h
>> @@ -0,0 +1,45 @@
>> +/* SPDX-License-Identifier: GPL-2.0-only */
>> +#ifndef POWERPC_RTAS_WORK_AREA_H
>> +#define POWERPC_RTAS_WORK_AREA_H
>
> The usual style would be _ASM_POWERPC_RTAS_WORK_AREA_H.

OK. (will change in all new headers)

>> +static struct rtas_work_area_allocator_state {
>> +	struct gen_pool *gen_pool;
>> +	char *arena;
>> +	struct mutex mutex; /* serializes allocations */
>> +	struct wait_queue_head wqh;
>> +	mempool_t descriptor_pool;
>> +	bool available;
>> +} rwa_state_ = {
>> +	.mutex = __MUTEX_INITIALIZER(rwa_state_.mutex),
>> +	.wqh = __WAIT_QUEUE_HEAD_INITIALIZER(rwa_state_.wqh),
>> +};
>> +static struct rtas_work_area_allocator_state *rwa_state = &rwa_state_;
>
> I assumed the pointer was so you could swap this out at runtime or
> something, but I don't think you do.
>
> Any reason not to drop the pointer and just use rwa_state.foo accessors?
> That would also allow the struct to be anonymous.
>
> Or if you have the pointer you can at least make it NULL prior to init
> and avoid the need for "available".

I think it's there because earlier versions of this that I never posted
had unit tests. I'll either resurrect those or reduce the indirection.


>> +/*
>> + * A single work area buffer and descriptor to serve requests early in
>> + * boot before the allocator is fully initialized.
>> + */
>> +static bool early_work_area_in_use __initdata;
>> +static char early_work_area_buf[SZ_4K] __initdata;
>
> That should be page aligned I think?

Yes. It happens to be safe in this version because ibm,get-system-parameter,
which has no alignment requirement, is the only function used early
enough to use the buffer. But that's too fragile.


>> +static struct rtas_work_area early_work_area __initdata = {
>> +	.buf = early_work_area_buf,
>> +	.size = sizeof(early_work_area_buf),
>> +};
>> +
>> +
>> +static struct rtas_work_area * __init rtas_work_area_alloc_early(size_t size)
>> +{
>> +	WARN_ON(size > early_work_area.size);
>> +	WARN_ON(early_work_area_in_use);
>> +	early_work_area_in_use = true;
>> +	memset(early_work_area.buf, 0, early_work_area.size);
>> +	return &early_work_area;
>> +}
>> +
>> +static void __init rtas_work_area_free_early(struct rtas_work_area *work_area)
>> +{
>> +	WARN_ON(work_area != &early_work_area);
>> +	WARN_ON(!early_work_area_in_use);
>> +	early_work_area_in_use = false;
>> +}
>> +
>> +struct rtas_work_area * __ref rtas_work_area_alloc(size_t size)
>> +{
>> +	struct rtas_work_area *area;
>> +	unsigned long addr;
>> +
>> +	might_sleep();
>> +
>> +	WARN_ON(size > RTAS_WORK_AREA_MAX_ALLOC_SZ);
>> +	size = min_t(size_t, size, RTAS_WORK_AREA_MAX_ALLOC_SZ);
>
> This seems unsafe.
>
> If you return a buffer smaller than the caller asks for they're likely
> to read/write past the end of it and corrupt memory.

OK, let's figure out another way to handle this.

> AFAIK genalloc doesn't have guard pages or anything fancy to save us
> from that - but maybe I'm wrong, I've never used it.

Yeah we would have to build our own thing on top of it. And I don't
think it could be something that traps on access, it would have to be a
check in rtas_work_area_free(), after the fact.

> There's only three callers in the end, seems like we should just return
> NULL if the size is too large and have callers check the return value.

There are more conversions to do, and a property I hope to maintain is
that requests can't fail. Existing users of rtas_data_buf don't have
error paths for failure to acquire the buffer.

I believe the allocation size passed to rtas_work_area_alloc() can be
known at build time in all cases. Maybe we could prevent inappropriate
requests from being built with a compile-time assertion (untested):

/* rtas-work-area.h */

static inline struct rtas_work_area *rtas_work_area_alloc(size_t sz)
{
	static_assert(sz < RTAS_WORK_AREA_MAX_ALLOC_SZ);
        return __rtas_work_area_alloc(sz);
}

I think this would be OK? If I can't make it work I'll fall back to
returning NULL as you suggest, but it will make for more churn (and
risk) in the conversions.


>> +	if (!rwa_state->available) {
>> +		area = rtas_work_area_alloc_early(size);
>> +		goto out;
>> +	}
>> +
>> +	/*
>> +	 * To ensure FCFS behavior and prevent a high rate of smaller
>> +	 * requests from starving larger ones, use the mutex to queue
>> +	 * allocations.
>> +	 */
>> +	mutex_lock(&rwa_state->mutex);
>> +	wait_event(rwa_state->wqh,
>> +		   (addr = gen_pool_alloc(rwa_state->gen_pool, size)) != 0);
>> +	mutex_unlock(&rwa_state->mutex);
>> +
>> +	area = mempool_alloc(&rwa_state->descriptor_pool, GFP_KERNEL);
>> +	*area = (typeof(*area)){
>> +		.size = size,
>> +		.buf = (char *)addr,
>> +	};
>
> That is an odd way to write that :)

yeah I'll change it.

>
>> +out:
>> +	pr_devel("%ps -> %s() -> buf=%p size=%zu\n",
>> +		 (void *)_RET_IP_, __func__, area->buf, area->size);
>
> Can we drop those? They need a recompile to enable, so if someone needs
> debugging they can just rewrite them - or use some sort of tracing
> instead.

Sure.


>> +machine_arch_initcall(pseries, rtas_work_area_allocator_init);
>
> Should it live in platforms/pseries then?

Yeah it probably ought to. I am pretty sure the "work area" construct is
PAPR-specific, and I haven't found any evidence that it's used on
non-pseries.


>> +/**
>> + * rtas_work_area_reserve_arena() - reserve memory suitable for RTAS work areas.
>> + */
>> +int __init rtas_work_area_reserve_arena(const phys_addr_t limit)
>> +{
>> +	const phys_addr_t align = RTAS_WORK_AREA_ARENA_ALIGN;
>> +	const phys_addr_t size = RTAS_WORK_AREA_ARENA_SZ;
>> +	const phys_addr_t min = MEMBLOCK_LOW_LIMIT;
>> +	const int nid = NUMA_NO_NODE;
>
> This should probably also be restricted to pseries?

Yes.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 18/19] powerpc/rtas: introduce rtas_function_token() API
  2023-02-08 12:09   ` Michael Ellerman
@ 2023-02-08 15:44     ` Nathan Lynch
  0 siblings, 0 replies; 50+ messages in thread
From: Nathan Lynch @ 2023-02-08 15:44 UTC (permalink / raw)
  To: Michael Ellerman, Nicholas Piggin, Christophe Leroy, Kajol Jain,
	Laurent Dufour, Mahesh J Salgaonkar, Andrew Donnellan,
	Nick Child
  Cc: linuxppc-dev

Michael Ellerman <mpe@ellerman.id.au> writes:
> Nathan Lynch via B4 Submission Endpoint
> <devnull+nathanl.linux.ibm.com@kernel.org> writes:
>> From: Nathan Lynch <nathanl@linux.ibm.com>
>>
>> Users of rtas_token() supply a string argument that can't be validated
>> at build time. A typo or misspelling has to be caught by inspection or
>> by observing wrong behavior at runtime.
>>
>> Since the core RTAS code now has consolidated the names of all
>> possible RTAS functions and mapped them to their tokens, token lookup
>> can be implemented using symbolic constants to index a static array.
>>
>> So introduce rtas_function_token(), a replacement API which does that,
>> along with a rtas_service_present()-equivalent helper,
>> rtas_function_implemented(). Callers supply an opaque predefined
>> function handle which is used internally to index the function
>> table. Typos or other inappropriate arguments yield build errors, and
>> the function handle is a type that can't be easily confused with RTAS
>> tokens or other integer types.
>
> Why not go all the way and have the rtas_call() signature be:
>
>   int rtas_call(rtas_fn_handle_t fn, int, int, int *, ...);
>
>
> And have it do the token lookup internally? That way a caller can never
> inadvertantly pass a random integer to rtas_call().
>
> And instead of eg:
>
> 	error = rtas_call(rtas_function_token(RTAS_FN_GET_TIME_OF_DAY), 0, 8, ret);
>
> we'd just need:
>
> 	error = rtas_call(RTAS_FN_GET_TIME_OF_DAY, 0, 8, ret);
>
>
> Doing the conversion all at once might be tricky. So maybe we need to add
> rtas_fn_call() which takes rtas_fn_handle_t so we can convert cases individually?
>
> Anyway just a thought. I guess we could merge this as-is and then do a
> further change to use rtas_fn_handle_t later.

You read my mind :-) But I want to go further and make the eventual
replacement for rtas_call() non-variadic, which will eliminate another
class of usage error.

Getting more ambitious: the ideal situation IMO would be that every use
of rtas_call() or its replacement is tidily contained in a C function in
kernel/rtas.c, where complexities like retries and error code
translation can be performed in a uniform way.

Anyway, a transition away from rtas_call(), whatever form it takes,
probably needs to happen incrementally.

>> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
>> index 41c430dc40c2..17e59306ce63 100644
>> --- a/arch/powerpc/kernel/rtas.c
>> +++ b/arch/powerpc/kernel/rtas.c
>> @@ -453,6 +453,26 @@ static struct rtas_function rtas_function_table[] __ro_after_init = {
>>  	},
>>  };
>>  
>> +/**
>> + * rtas_function_token() - RTAS function token lookup.
>> + * @handle: Function handle, e.g. RTAS_FN_EVENT_SCAN.
>> + *
>> + * Context: Any context.
>> + * Return: the token value for the function if implemented by this platform,
>> + *         otherwise RTAS_UNKNOWN_SERVICE.
>> + */
>> +s32 rtas_function_token(const rtas_fn_handle_t handle)
>> +{
>> +	const size_t index = handle.index;
>> +	const bool out_of_bounds = index >= ARRAY_SIZE(rtas_function_table);
>> +
>> +	if (WARN_ONCE(out_of_bounds, "invalid function index %zu", index))
>> +		return RTAS_UNKNOWN_SERVICE;
>
> This needs:
>
> +	// If RTAS is not present or not initialised (yet) return unknown
> +	if (!rtas.dev)
> +		return RTAS_UNKNOWN_SERVICE;
> +
>
> Otherwise powernv breaks because it looks up tokens and gets back 0,
> because we never got as far as rtas_function_table_init() (to set all the
> tokens to RTAS_UNKNOWN_SERVICE), because we bailed out at the start of
> rtas_initialize() when we found no /rtas node.

Oh! OK, thanks.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 01/19] powerpc/rtas: handle extended delays safely in early boot
  2023-02-08 13:14     ` Nathan Lynch
@ 2023-02-10  5:54       ` Michael Ellerman
  0 siblings, 0 replies; 50+ messages in thread
From: Michael Ellerman @ 2023-02-10  5:54 UTC (permalink / raw)
  To: Nathan Lynch, Nathan Lynch via B4 Submission Endpoint,
	Nicholas Piggin, Christophe Leroy, Kajol Jain, Laurent Dufour,
	Mahesh J Salgaonkar, Andrew Donnellan, Nick Child
  Cc: linuxppc-dev

Nathan Lynch <nathanl@linux.ibm.com> writes:
> Michael Ellerman <mpe@ellerman.id.au> writes:
>> Nathan Lynch via B4 Submission Endpoint <devnull+nathanl.linux.ibm.com@kernel.org> writes:
>>> From: Nathan Lynch <nathanl@linux.ibm.com>
>>>
>>> Some code that runs early in boot calls RTAS functions that can return
>>> -2 or 990x statuses, which mean the caller should retry. An example is
>>> pSeries_cmo_feature_init(), which invokes ibm,get-system-parameter but
>>> treats these benign statuses as errors instead of retrying.
>>>
>>> pSeries_cmo_feature_init() and similar code should be made to retry
>>> until they succeed or receive a real error, using the usual pattern:
>>>
>>> 	do {
>>> 		rc = rtas_call(token, etc...);
>>> 	} while (rtas_busy_delay(rc));
>>>
>>> But rtas_busy_delay() will perform a timed sleep on any 990x
>>> status. This isn't safe so early in boot, before the CPU scheduler and
>>> timer subsystem have initialized.
>>>
>>> The -2 RTAS status is much more likely to occur during single-threaded
>>> boot than 990x in practice, at least on PowerVM. This is because -2
>>> usually means that RTAS made progress but exhausted its self-imposed
>>> timeslice, while 990x is associated with concurrent requests from the
>>> OS causing internal contention. Regardless, according to the language
>>> in PAPR, the OS should be prepared to handle either type of status at
>>> any time.
>>>
>>> Add a fallback path to rtas_busy_delay() to handle this as safely as
>>> possible, performing a small delay on 990x. Include a counter to
>>> detect retry loops that aren't making progress and bail out.
>>>
>>> This was found by inspection and I'm not aware of any real
>>> failures. However, the implementation of rtas_busy_delay() before
>>> commit 38f7b7067dae ("powerpc/rtas: rtas_busy_delay() improvements")
>>> was not susceptible to this problem, so let's treat this as a
>>> regression.
>>>
>>> Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
>>> Fixes: 38f7b7067dae ("powerpc/rtas: rtas_busy_delay() improvements")
>>> ---
>>>  arch/powerpc/kernel/rtas.c | 48 +++++++++++++++++++++++++++++++++++++++++++++-
>>>  1 file changed, 47 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
>>> index 795225d7f138..ec2df09a70cf 100644
>>> --- a/arch/powerpc/kernel/rtas.c
>>> +++ b/arch/powerpc/kernel/rtas.c
>>> @@ -606,6 +606,46 @@ unsigned int rtas_busy_delay_time(int status)
>>>  	return ms;
>>>  }
>>>  
>>> +/*
>>> + * Early boot fallback for rtas_busy_delay().
>>> + */
>>> +static bool __init rtas_busy_delay_early(int status)
>>> +{
>>> +	static size_t successive_ext_delays __initdata;
>>> +	bool ret;
>>
>> I think the logic would be easier to read if this was called "wait", but
>> maybe that's just me.
>
> Maybe "retry"? That communicates what the function is telling callers to do.

Yeah, that's even better.

cheers

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 11/19] powerpc/rtas: add work area allocator
  2023-02-08 14:48     ` Nathan Lynch
@ 2023-02-10  6:07       ` Michael Ellerman
  0 siblings, 0 replies; 50+ messages in thread
From: Michael Ellerman @ 2023-02-10  6:07 UTC (permalink / raw)
  To: Nathan Lynch, Nathan Lynch via B4 Submission Endpoint,
	Nicholas Piggin, Christophe Leroy, Kajol Jain, Laurent Dufour,
	Mahesh J Salgaonkar, Andrew Donnellan, Nick Child
  Cc: linuxppc-dev

Nathan Lynch <nathanl@linux.ibm.com> writes:
> Michael Ellerman <mpe@ellerman.id.au> writes:
>> Nathan Lynch via B4 Submission Endpoint
>> <devnull+nathanl.linux.ibm.com@kernel.org> writes:
...
>>> +struct rtas_work_area * __ref rtas_work_area_alloc(size_t size)
>>> +{
>>> +	struct rtas_work_area *area;
>>> +	unsigned long addr;
>>> +
>>> +	might_sleep();
>>> +
>>> +	WARN_ON(size > RTAS_WORK_AREA_MAX_ALLOC_SZ);
>>> +	size = min_t(size_t, size, RTAS_WORK_AREA_MAX_ALLOC_SZ);
>>
>> This seems unsafe.
>>
>> If you return a buffer smaller than the caller asks for they're likely
>> to read/write past the end of it and corrupt memory.
>
> OK, let's figure out another way to handle this.
>
>> AFAIK genalloc doesn't have guard pages or anything fancy to save us
>> from that - but maybe I'm wrong, I've never used it.
>
> Yeah we would have to build our own thing on top of it. And I don't
> think it could be something that traps on access, it would have to be a
> check in rtas_work_area_free(), after the fact.

I *think* we could use the MMU. We'd just have to allocate whole pages,
and then vmap() them (create a mapping in vmalloc space), and then give
the vmalloc space address back to the caller. They'd then operate on
that address, meaning any overflow would trap. You already have
rtas_work_area_phys() for passing the phys address to RTAS.

But that would be a lot more complicated than your suggestion below.

>> There's only three callers in the end, seems like we should just return
>> NULL if the size is too large and have callers check the return value.
>
> There are more conversions to do, and a property I hope to maintain is
> that requests can't fail. Existing users of rtas_data_buf don't have
> error paths for failure to acquire the buffer.
>
> I believe the allocation size passed to rtas_work_area_alloc() can be
> known at build time in all cases. Maybe we could prevent inappropriate
> requests from being built with a compile-time assertion (untested):
>
> /* rtas-work-area.h */
>
> static inline struct rtas_work_area *rtas_work_area_alloc(size_t sz)
> {
> 	static_assert(sz < RTAS_WORK_AREA_MAX_ALLOC_SZ);
>         return __rtas_work_area_alloc(sz);
> }
>
> I think this would be OK? If I can't make it work I'll fall back to
> returning NULL as you suggest, but it will make for more churn (and
> risk) in the conversions.

Yeah if the sizes are always known at compile time that is a much better
solution.

cheers

^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2023-02-10  6:08 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-06 18:54 [PATCH v2 00/19] RTAS maintenance Nathan Lynch
2023-02-06 18:54 ` Nathan Lynch via B4 Submission Endpoint
2023-02-06 18:54 ` [PATCH v2 01/19] powerpc/rtas: handle extended delays safely in early boot Nathan Lynch
2023-02-06 18:54   ` Nathan Lynch via B4 Submission Endpoint
2023-02-08 11:20   ` Michael Ellerman
2023-02-08 13:14     ` Nathan Lynch
2023-02-10  5:54       ` Michael Ellerman
2023-02-06 18:54 ` [PATCH v2 02/19] powerpc/perf/hv-24x7: add missing RTAS retry status handling Nathan Lynch via B4 Submission Endpoint
2023-02-06 18:54   ` Nathan Lynch
2023-02-06 18:54 ` [PATCH v2 03/19] powerpc/pseries/lpar: " Nathan Lynch
2023-02-06 18:54   ` Nathan Lynch via B4 Submission Endpoint
2023-02-06 18:54 ` [PATCH v2 04/19] powerpc/pseries/lparcfg: " Nathan Lynch via B4 Submission Endpoint
2023-02-06 18:54   ` Nathan Lynch
2023-02-06 18:54 ` [PATCH v2 05/19] powerpc/pseries/setup: " Nathan Lynch
2023-02-06 18:54   ` Nathan Lynch via B4 Submission Endpoint
2023-02-06 18:54 ` [PATCH v2 06/19] powerpc/pseries: drop RTAS-based timebase synchronization Nathan Lynch
2023-02-06 18:54   ` Nathan Lynch via B4 Submission Endpoint
2023-02-06 18:54 ` [PATCH v2 07/19] powerpc/rtas: improve function information lookups Nathan Lynch via B4 Submission Endpoint
2023-02-06 18:54   ` Nathan Lynch
2023-02-08 11:57   ` Michael Ellerman
2023-02-08 13:16     ` Nathan Lynch
2023-02-06 18:54 ` [PATCH v2 08/19] powerpc/rtas: strengthen do_enter_rtas() type safety, drop inline Nathan Lynch
2023-02-06 18:54   ` Nathan Lynch via B4 Submission Endpoint
2023-02-06 18:54 ` [PATCH v2 09/19] powerpc/tracing: tracepoints for RTAS entry and exit Nathan Lynch
2023-02-06 18:54   ` Nathan Lynch via B4 Submission Endpoint
2023-02-06 18:54 ` [PATCH v2 10/19] powerpc/rtas: add tracepoints around RTAS entry Nathan Lynch
2023-02-06 18:54   ` Nathan Lynch via B4 Submission Endpoint
2023-02-06 18:54 ` [PATCH v2 11/19] powerpc/rtas: add work area allocator Nathan Lynch
2023-02-06 18:54   ` Nathan Lynch via B4 Submission Endpoint
2023-02-08 11:58   ` Michael Ellerman
2023-02-08 14:48     ` Nathan Lynch
2023-02-10  6:07       ` Michael Ellerman
2023-02-06 18:54 ` [PATCH v2 12/19] powerpc/pseries/dlpar: use RTAS work area API Nathan Lynch
2023-02-06 18:54   ` Nathan Lynch via B4 Submission Endpoint
2023-02-06 18:54 ` [PATCH v2 13/19] powerpc/pseries: PAPR system parameter API Nathan Lynch
2023-02-06 18:54   ` Nathan Lynch via B4 Submission Endpoint
2023-02-06 18:54 ` [PATCH v2 14/19] powerpc/pseries: convert CMO probe to papr_sysparm API Nathan Lynch
2023-02-06 18:54   ` Nathan Lynch via B4 Submission Endpoint
2023-02-06 18:54 ` [PATCH v2 15/19] powerpc/pseries/lparcfg: convert " Nathan Lynch
2023-02-06 18:54   ` Nathan Lynch via B4 Submission Endpoint
2023-02-06 18:54 ` [PATCH v2 16/19] powerpc/pseries/hv-24x7: " Nathan Lynch via B4 Submission Endpoint
2023-02-06 18:54   ` Nathan Lynch
2023-02-06 18:54 ` [PATCH v2 17/19] powerpc/pseries/lpar: " Nathan Lynch
2023-02-06 18:54   ` Nathan Lynch via B4 Submission Endpoint
2023-02-06 18:54 ` [PATCH v2 18/19] powerpc/rtas: introduce rtas_function_token() API Nathan Lynch
2023-02-06 18:54   ` Nathan Lynch via B4 Submission Endpoint
2023-02-08 12:09   ` Michael Ellerman
2023-02-08 15:44     ` Nathan Lynch
2023-02-06 18:54 ` [PATCH v2 19/19] powerpc/rtas: arch-wide function token lookup conversions Nathan Lynch
2023-02-06 18:54   ` Nathan Lynch via B4 Submission Endpoint

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.