linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] Revert "powerpc/rtas: Implement reentrant rtas call"
@ 2022-09-07 22:01 Nathan Lynch
  2022-09-08  7:56 ` Laurent Dufour
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Nathan Lynch @ 2022-09-07 22:01 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: leobras.c, ldufour, haren, tyreld

At the time this was submitted by Leonardo, I confirmed -- or thought
I had confirmed -- with PowerVM partition firmware development that
the following RTAS functions:

- ibm,get-xive
- ibm,int-off
- ibm,int-on
- ibm,set-xive

were safe to call on multiple CPUs simultaneously, not only with
respect to themselves as indicated by PAPR, but with arbitrary other
RTAS calls:

https://lore.kernel.org/linuxppc-dev/875zcy2v8o.fsf@linux.ibm.com/

Recent discussion with firmware development makes it clear that this
is not true, and that the code in commit b664db8e3f97 ("powerpc/rtas:
Implement reentrant rtas call") is unsafe, likely explaining several
strange bugs we've seen in internal testing involving DLPAR and
LPM. These scenarios use ibm,configure-connector, whose internal state
can be corrupted by the concurrent use of the "reentrant" functions,
leading to symptoms like endless busy statuses from RTAS.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Fixes: b664db8e3f97 ("powerpc/rtas: Implement reentrant rtas call")
Cc: stable@vger.kernel.org
---
 arch/powerpc/include/asm/paca.h     |  1 -
 arch/powerpc/include/asm/rtas.h     |  1 -
 arch/powerpc/kernel/paca.c          | 32 -----------------
 arch/powerpc/kernel/rtas.c          | 54 -----------------------------
 arch/powerpc/sysdev/xics/ics-rtas.c | 22 ++++++------
 5 files changed, 11 insertions(+), 99 deletions(-)

diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index 4d7aaab82702..3537b0500f4d 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -263,7 +263,6 @@ struct paca_struct {
 	u64 l1d_flush_size;
 #endif
 #ifdef CONFIG_PPC_PSERIES
-	struct rtas_args *rtas_args_reentrant;
 	u8 *mce_data_buf;		/* buffer to hold per cpu rtas errlog */
 #endif /* CONFIG_PPC_PSERIES */
 
diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
index 00531af17ce0..56319aea646e 100644
--- a/arch/powerpc/include/asm/rtas.h
+++ b/arch/powerpc/include/asm/rtas.h
@@ -240,7 +240,6 @@ extern struct rtas_t rtas;
 extern int rtas_token(const char *service);
 extern int rtas_service_present(const char *service);
 extern int rtas_call(int token, int, int, int *, ...);
-int rtas_call_reentrant(int token, int nargs, int nret, int *outputs, ...);
 void rtas_call_unlocked(struct rtas_args *args, int token, int nargs,
 			int nret, ...);
 extern void __noreturn rtas_restart(char *cmd);
diff --git a/arch/powerpc/kernel/paca.c b/arch/powerpc/kernel/paca.c
index ba593fd60124..dfd097b79160 100644
--- a/arch/powerpc/kernel/paca.c
+++ b/arch/powerpc/kernel/paca.c
@@ -16,7 +16,6 @@
 #include <asm/kexec.h>
 #include <asm/svm.h>
 #include <asm/ultravisor.h>
-#include <asm/rtas.h>
 
 #include "setup.h"
 
@@ -170,30 +169,6 @@ static struct slb_shadow * __init new_slb_shadow(int cpu, unsigned long limit)
 }
 #endif /* CONFIG_PPC_64S_HASH_MMU */
 
-#ifdef CONFIG_PPC_PSERIES
-/**
- * new_rtas_args() - Allocates rtas args
- * @cpu:	CPU number
- * @limit:	Memory limit for this allocation
- *
- * Allocates a struct rtas_args and return it's pointer,
- * if not in Hypervisor mode
- *
- * Return:	Pointer to allocated rtas_args
- *		NULL if CPU in Hypervisor Mode
- */
-static struct rtas_args * __init new_rtas_args(int cpu, unsigned long limit)
-{
-	limit = min_t(unsigned long, limit, RTAS_INSTANTIATE_MAX);
-
-	if (early_cpu_has_feature(CPU_FTR_HVMODE))
-		return NULL;
-
-	return alloc_paca_data(sizeof(struct rtas_args), L1_CACHE_BYTES,
-			       limit, cpu);
-}
-#endif /* CONFIG_PPC_PSERIES */
-
 /* The Paca is an array with one entry per processor.  Each contains an
  * lppaca, which contains the information shared between the
  * hypervisor and Linux.
@@ -232,10 +207,6 @@ void __init initialise_paca(struct paca_struct *new_paca, int cpu)
 	/* For now -- if we have threads this will be adjusted later */
 	new_paca->tcd_ptr = &new_paca->tcd;
 #endif
-
-#ifdef CONFIG_PPC_PSERIES
-	new_paca->rtas_args_reentrant = NULL;
-#endif
 }
 
 /* Put the paca pointer into r13 and SPRG_PACA */
@@ -307,9 +278,6 @@ void __init allocate_paca(int cpu)
 #endif
 #ifdef CONFIG_PPC_64S_HASH_MMU
 	paca->slb_shadow_ptr = new_slb_shadow(cpu, limit);
-#endif
-#ifdef CONFIG_PPC_PSERIES
-	paca->rtas_args_reentrant = new_rtas_args(cpu, limit);
 #endif
 	paca_struct_size += sizeof(struct paca_struct);
 }
diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index 693133972294..0b8a858aa847 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -43,7 +43,6 @@
 #include <asm/time.h>
 #include <asm/mmu.h>
 #include <asm/topology.h>
-#include <asm/paca.h>
 
 /* This is here deliberately so it's only used in this file */
 void enter_rtas(unsigned long);
@@ -932,59 +931,6 @@ void rtas_activate_firmware(void)
 		pr_err("ibm,activate-firmware failed (%i)\n", fwrc);
 }
 
-#ifdef CONFIG_PPC_PSERIES
-/**
- * rtas_call_reentrant() - Used for reentrant rtas calls
- * @token:	Token for desired reentrant RTAS call
- * @nargs:	Number of Input Parameters
- * @nret:	Number of Output Parameters
- * @outputs:	Array of outputs
- * @...:	Inputs for desired RTAS call
- *
- * According to LoPAR documentation, only "ibm,int-on", "ibm,int-off",
- * "ibm,get-xive" and "ibm,set-xive" are currently reentrant.
- * Reentrant calls need their own rtas_args buffer, so not using rtas.args, but
- * PACA one instead.
- *
- * Return:	-1 on error,
- *		First output value of RTAS call if (nret > 0),
- *		0 otherwise,
- */
-int rtas_call_reentrant(int token, int nargs, int nret, int *outputs, ...)
-{
-	va_list list;
-	struct rtas_args *args;
-	unsigned long flags;
-	int i, ret = 0;
-
-	if (!rtas.entry || token == RTAS_UNKNOWN_SERVICE)
-		return -1;
-
-	local_irq_save(flags);
-	preempt_disable();
-
-	/* We use the per-cpu (PACA) rtas args buffer */
-	args = local_paca->rtas_args_reentrant;
-
-	va_start(list, outputs);
-	va_rtas_call_unlocked(args, token, nargs, nret, list);
-	va_end(list);
-
-	if (nret > 1 && outputs)
-		for (i = 0; i < nret - 1; ++i)
-			outputs[i] = be32_to_cpu(args->rets[i + 1]);
-
-	if (nret > 0)
-		ret = be32_to_cpu(args->rets[0]);
-
-	local_irq_restore(flags);
-	preempt_enable();
-
-	return ret;
-}
-
-#endif /* CONFIG_PPC_PSERIES */
-
 /**
  * get_pseries_errorlog() - Find a specific pseries error log in an RTAS
  *                          extended event log.
diff --git a/arch/powerpc/sysdev/xics/ics-rtas.c b/arch/powerpc/sysdev/xics/ics-rtas.c
index 9e7007f9aca5..f8320f8e5bc7 100644
--- a/arch/powerpc/sysdev/xics/ics-rtas.c
+++ b/arch/powerpc/sysdev/xics/ics-rtas.c
@@ -36,8 +36,8 @@ static void ics_rtas_unmask_irq(struct irq_data *d)
 
 	server = xics_get_irq_server(d->irq, irq_data_get_affinity_mask(d), 0);
 
-	call_status = rtas_call_reentrant(ibm_set_xive, 3, 1, NULL, hw_irq,
-					  server, DEFAULT_PRIORITY);
+	call_status = rtas_call(ibm_set_xive, 3, 1, NULL, hw_irq, server,
+				DEFAULT_PRIORITY);
 	if (call_status != 0) {
 		printk(KERN_ERR
 			"%s: ibm_set_xive irq %u server %x returned %d\n",
@@ -46,7 +46,7 @@ static void ics_rtas_unmask_irq(struct irq_data *d)
 	}
 
 	/* Now unmask the interrupt (often a no-op) */
-	call_status = rtas_call_reentrant(ibm_int_on, 1, 1, NULL, hw_irq);
+	call_status = rtas_call(ibm_int_on, 1, 1, NULL, hw_irq);
 	if (call_status != 0) {
 		printk(KERN_ERR "%s: ibm_int_on irq=%u returned %d\n",
 			__func__, hw_irq, call_status);
@@ -68,7 +68,7 @@ static void ics_rtas_mask_real_irq(unsigned int hw_irq)
 	if (hw_irq == XICS_IPI)
 		return;
 
-	call_status = rtas_call_reentrant(ibm_int_off, 1, 1, NULL, hw_irq);
+	call_status = rtas_call(ibm_int_off, 1, 1, NULL, hw_irq);
 	if (call_status != 0) {
 		printk(KERN_ERR "%s: ibm_int_off irq=%u returned %d\n",
 			__func__, hw_irq, call_status);
@@ -76,8 +76,8 @@ static void ics_rtas_mask_real_irq(unsigned int hw_irq)
 	}
 
 	/* Have to set XIVE to 0xff to be able to remove a slot */
-	call_status = rtas_call_reentrant(ibm_set_xive, 3, 1, NULL, hw_irq,
-					  xics_default_server, 0xff);
+	call_status = rtas_call(ibm_set_xive, 3, 1, NULL, hw_irq,
+				xics_default_server, 0xff);
 	if (call_status != 0) {
 		printk(KERN_ERR "%s: ibm_set_xive(0xff) irq=%u returned %d\n",
 			__func__, hw_irq, call_status);
@@ -108,7 +108,7 @@ static int ics_rtas_set_affinity(struct irq_data *d,
 	if (hw_irq == XICS_IPI || hw_irq == XICS_IRQ_SPURIOUS)
 		return -1;
 
-	status = rtas_call_reentrant(ibm_get_xive, 1, 3, xics_status, hw_irq);
+	status = rtas_call(ibm_get_xive, 1, 3, xics_status, hw_irq);
 
 	if (status) {
 		printk(KERN_ERR "%s: ibm,get-xive irq=%u returns %d\n",
@@ -126,8 +126,8 @@ static int ics_rtas_set_affinity(struct irq_data *d,
 	pr_debug("%s: irq %d [hw 0x%x] server: 0x%x\n", __func__, d->irq,
 		 hw_irq, irq_server);
 
-	status = rtas_call_reentrant(ibm_set_xive, 3, 1, NULL,
-				     hw_irq, irq_server, xics_status[1]);
+	status = rtas_call(ibm_set_xive, 3, 1, NULL,
+			   hw_irq, irq_server, xics_status[1]);
 
 	if (status) {
 		printk(KERN_ERR "%s: ibm,set-xive irq=%u returns %d\n",
@@ -158,7 +158,7 @@ static int ics_rtas_check(struct ics *ics, unsigned int hw_irq)
 		return -EINVAL;
 
 	/* Check if RTAS knows about this interrupt */
-	rc = rtas_call_reentrant(ibm_get_xive, 1, 3, status, hw_irq);
+	rc = rtas_call(ibm_get_xive, 1, 3, status, hw_irq);
 	if (rc)
 		return -ENXIO;
 
@@ -174,7 +174,7 @@ static long ics_rtas_get_server(struct ics *ics, unsigned long vec)
 {
 	int rc, status[2];
 
-	rc = rtas_call_reentrant(ibm_get_xive, 1, 3, status, vec);
+	rc = rtas_call(ibm_get_xive, 1, 3, status, vec);
 	if (rc)
 		return -1;
 	return status[0];
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] Revert "powerpc/rtas: Implement reentrant rtas call"
  2022-09-07 22:01 [PATCH] Revert "powerpc/rtas: Implement reentrant rtas call" Nathan Lynch
@ 2022-09-08  7:56 ` Laurent Dufour
       [not found] ` <1d76891ee052112ee1547a4027e358d5cbcac23d.camel@gmail.com>
  2022-09-23 10:57 ` Michael Ellerman
  2 siblings, 0 replies; 15+ messages in thread
From: Laurent Dufour @ 2022-09-08  7:56 UTC (permalink / raw)
  To: Nathan Lynch, linuxppc-dev; +Cc: leobras.c, haren, tyreld

Le 08/09/2022 à 00:01, Nathan Lynch a écrit :
> At the time this was submitted by Leonardo, I confirmed -- or thought
> I had confirmed -- with PowerVM partition firmware development that
> the following RTAS functions:
> 
> - ibm,get-xive
> - ibm,int-off
> - ibm,int-on
> - ibm,set-xive
> 
> were safe to call on multiple CPUs simultaneously, not only with
> respect to themselves as indicated by PAPR, but with arbitrary other
> RTAS calls:
> 
> https://lore.kernel.org/linuxppc-dev/875zcy2v8o.fsf@linux.ibm.com/
> 
> Recent discussion with firmware development makes it clear that this
> is not true, and that the code in commit b664db8e3f97 ("powerpc/rtas:
> Implement reentrant rtas call") is unsafe, likely explaining several
> strange bugs we've seen in internal testing involving DLPAR and
> LPM. These scenarios use ibm,configure-connector, whose internal state
> can be corrupted by the concurrent use of the "reentrant" functions,
> leading to symptoms like endless busy statuses from RTAS.

Thanks, Nathan,
T
his is fixing LPAR hangs I was facing when doing some migration tests.

Reviewed-by: Laurent Dufour <laurent.dufour@fr.ibm.com>

> Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
> Fixes: b664db8e3f97 ("powerpc/rtas: Implement reentrant rtas call")
> Cc: stable@vger.kernel.org
> ---
>  arch/powerpc/include/asm/paca.h     |  1 -
>  arch/powerpc/include/asm/rtas.h     |  1 -
>  arch/powerpc/kernel/paca.c          | 32 -----------------
>  arch/powerpc/kernel/rtas.c          | 54 -----------------------------
>  arch/powerpc/sysdev/xics/ics-rtas.c | 22 ++++++------
>  5 files changed, 11 insertions(+), 99 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
> index 4d7aaab82702..3537b0500f4d 100644
> --- a/arch/powerpc/include/asm/paca.h
> +++ b/arch/powerpc/include/asm/paca.h
> @@ -263,7 +263,6 @@ struct paca_struct {
>  	u64 l1d_flush_size;
>  #endif
>  #ifdef CONFIG_PPC_PSERIES
> -	struct rtas_args *rtas_args_reentrant;
>  	u8 *mce_data_buf;		/* buffer to hold per cpu rtas errlog */
>  #endif /* CONFIG_PPC_PSERIES */
>  
> diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
> index 00531af17ce0..56319aea646e 100644
> --- a/arch/powerpc/include/asm/rtas.h
> +++ b/arch/powerpc/include/asm/rtas.h
> @@ -240,7 +240,6 @@ extern struct rtas_t rtas;
>  extern int rtas_token(const char *service);
>  extern int rtas_service_present(const char *service);
>  extern int rtas_call(int token, int, int, int *, ...);
> -int rtas_call_reentrant(int token, int nargs, int nret, int *outputs, ...);
>  void rtas_call_unlocked(struct rtas_args *args, int token, int nargs,
>  			int nret, ...);
>  extern void __noreturn rtas_restart(char *cmd);
> diff --git a/arch/powerpc/kernel/paca.c b/arch/powerpc/kernel/paca.c
> index ba593fd60124..dfd097b79160 100644
> --- a/arch/powerpc/kernel/paca.c
> +++ b/arch/powerpc/kernel/paca.c
> @@ -16,7 +16,6 @@
>  #include <asm/kexec.h>
>  #include <asm/svm.h>
>  #include <asm/ultravisor.h>
> -#include <asm/rtas.h>
>  
>  #include "setup.h"
>  
> @@ -170,30 +169,6 @@ static struct slb_shadow * __init new_slb_shadow(int cpu, unsigned long limit)
>  }
>  #endif /* CONFIG_PPC_64S_HASH_MMU */
>  
> -#ifdef CONFIG_PPC_PSERIES
> -/**
> - * new_rtas_args() - Allocates rtas args
> - * @cpu:	CPU number
> - * @limit:	Memory limit for this allocation
> - *
> - * Allocates a struct rtas_args and return it's pointer,
> - * if not in Hypervisor mode
> - *
> - * Return:	Pointer to allocated rtas_args
> - *		NULL if CPU in Hypervisor Mode
> - */
> -static struct rtas_args * __init new_rtas_args(int cpu, unsigned long limit)
> -{
> -	limit = min_t(unsigned long, limit, RTAS_INSTANTIATE_MAX);
> -
> -	if (early_cpu_has_feature(CPU_FTR_HVMODE))
> -		return NULL;
> -
> -	return alloc_paca_data(sizeof(struct rtas_args), L1_CACHE_BYTES,
> -			       limit, cpu);
> -}
> -#endif /* CONFIG_PPC_PSERIES */
> -
>  /* The Paca is an array with one entry per processor.  Each contains an
>   * lppaca, which contains the information shared between the
>   * hypervisor and Linux.
> @@ -232,10 +207,6 @@ void __init initialise_paca(struct paca_struct *new_paca, int cpu)
>  	/* For now -- if we have threads this will be adjusted later */
>  	new_paca->tcd_ptr = &new_paca->tcd;
>  #endif
> -
> -#ifdef CONFIG_PPC_PSERIES
> -	new_paca->rtas_args_reentrant = NULL;
> -#endif
>  }
>  
>  /* Put the paca pointer into r13 and SPRG_PACA */
> @@ -307,9 +278,6 @@ void __init allocate_paca(int cpu)
>  #endif
>  #ifdef CONFIG_PPC_64S_HASH_MMU
>  	paca->slb_shadow_ptr = new_slb_shadow(cpu, limit);
> -#endif
> -#ifdef CONFIG_PPC_PSERIES
> -	paca->rtas_args_reentrant = new_rtas_args(cpu, limit);
>  #endif
>  	paca_struct_size += sizeof(struct paca_struct);
>  }
> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
> index 693133972294..0b8a858aa847 100644
> --- a/arch/powerpc/kernel/rtas.c
> +++ b/arch/powerpc/kernel/rtas.c
> @@ -43,7 +43,6 @@
>  #include <asm/time.h>
>  #include <asm/mmu.h>
>  #include <asm/topology.h>
> -#include <asm/paca.h>
>  
>  /* This is here deliberately so it's only used in this file */
>  void enter_rtas(unsigned long);
> @@ -932,59 +931,6 @@ void rtas_activate_firmware(void)
>  		pr_err("ibm,activate-firmware failed (%i)\n", fwrc);
>  }
>  
> -#ifdef CONFIG_PPC_PSERIES
> -/**
> - * rtas_call_reentrant() - Used for reentrant rtas calls
> - * @token:	Token for desired reentrant RTAS call
> - * @nargs:	Number of Input Parameters
> - * @nret:	Number of Output Parameters
> - * @outputs:	Array of outputs
> - * @...:	Inputs for desired RTAS call
> - *
> - * According to LoPAR documentation, only "ibm,int-on", "ibm,int-off",
> - * "ibm,get-xive" and "ibm,set-xive" are currently reentrant.
> - * Reentrant calls need their own rtas_args buffer, so not using rtas.args, but
> - * PACA one instead.
> - *
> - * Return:	-1 on error,
> - *		First output value of RTAS call if (nret > 0),
> - *		0 otherwise,
> - */
> -int rtas_call_reentrant(int token, int nargs, int nret, int *outputs, ...)
> -{
> -	va_list list;
> -	struct rtas_args *args;
> -	unsigned long flags;
> -	int i, ret = 0;
> -
> -	if (!rtas.entry || token == RTAS_UNKNOWN_SERVICE)
> -		return -1;
> -
> -	local_irq_save(flags);
> -	preempt_disable();
> -
> -	/* We use the per-cpu (PACA) rtas args buffer */
> -	args = local_paca->rtas_args_reentrant;
> -
> -	va_start(list, outputs);
> -	va_rtas_call_unlocked(args, token, nargs, nret, list);
> -	va_end(list);
> -
> -	if (nret > 1 && outputs)
> -		for (i = 0; i < nret - 1; ++i)
> -			outputs[i] = be32_to_cpu(args->rets[i + 1]);
> -
> -	if (nret > 0)
> -		ret = be32_to_cpu(args->rets[0]);
> -
> -	local_irq_restore(flags);
> -	preempt_enable();
> -
> -	return ret;
> -}
> -
> -#endif /* CONFIG_PPC_PSERIES */
> -
>  /**
>   * get_pseries_errorlog() - Find a specific pseries error log in an RTAS
>   *                          extended event log.
> diff --git a/arch/powerpc/sysdev/xics/ics-rtas.c b/arch/powerpc/sysdev/xics/ics-rtas.c
> index 9e7007f9aca5..f8320f8e5bc7 100644
> --- a/arch/powerpc/sysdev/xics/ics-rtas.c
> +++ b/arch/powerpc/sysdev/xics/ics-rtas.c
> @@ -36,8 +36,8 @@ static void ics_rtas_unmask_irq(struct irq_data *d)
>  
>  	server = xics_get_irq_server(d->irq, irq_data_get_affinity_mask(d), 0);
>  
> -	call_status = rtas_call_reentrant(ibm_set_xive, 3, 1, NULL, hw_irq,
> -					  server, DEFAULT_PRIORITY);
> +	call_status = rtas_call(ibm_set_xive, 3, 1, NULL, hw_irq, server,
> +				DEFAULT_PRIORITY);
>  	if (call_status != 0) {
>  		printk(KERN_ERR
>  			"%s: ibm_set_xive irq %u server %x returned %d\n",
> @@ -46,7 +46,7 @@ static void ics_rtas_unmask_irq(struct irq_data *d)
>  	}
>  
>  	/* Now unmask the interrupt (often a no-op) */
> -	call_status = rtas_call_reentrant(ibm_int_on, 1, 1, NULL, hw_irq);
> +	call_status = rtas_call(ibm_int_on, 1, 1, NULL, hw_irq);
>  	if (call_status != 0) {
>  		printk(KERN_ERR "%s: ibm_int_on irq=%u returned %d\n",
>  			__func__, hw_irq, call_status);
> @@ -68,7 +68,7 @@ static void ics_rtas_mask_real_irq(unsigned int hw_irq)
>  	if (hw_irq == XICS_IPI)
>  		return;
>  
> -	call_status = rtas_call_reentrant(ibm_int_off, 1, 1, NULL, hw_irq);
> +	call_status = rtas_call(ibm_int_off, 1, 1, NULL, hw_irq);
>  	if (call_status != 0) {
>  		printk(KERN_ERR "%s: ibm_int_off irq=%u returned %d\n",
>  			__func__, hw_irq, call_status);
> @@ -76,8 +76,8 @@ static void ics_rtas_mask_real_irq(unsigned int hw_irq)
>  	}
>  
>  	/* Have to set XIVE to 0xff to be able to remove a slot */
> -	call_status = rtas_call_reentrant(ibm_set_xive, 3, 1, NULL, hw_irq,
> -					  xics_default_server, 0xff);
> +	call_status = rtas_call(ibm_set_xive, 3, 1, NULL, hw_irq,
> +				xics_default_server, 0xff);
>  	if (call_status != 0) {
>  		printk(KERN_ERR "%s: ibm_set_xive(0xff) irq=%u returned %d\n",
>  			__func__, hw_irq, call_status);
> @@ -108,7 +108,7 @@ static int ics_rtas_set_affinity(struct irq_data *d,
>  	if (hw_irq == XICS_IPI || hw_irq == XICS_IRQ_SPURIOUS)
>  		return -1;
>  
> -	status = rtas_call_reentrant(ibm_get_xive, 1, 3, xics_status, hw_irq);
> +	status = rtas_call(ibm_get_xive, 1, 3, xics_status, hw_irq);
>  
>  	if (status) {
>  		printk(KERN_ERR "%s: ibm,get-xive irq=%u returns %d\n",
> @@ -126,8 +126,8 @@ static int ics_rtas_set_affinity(struct irq_data *d,
>  	pr_debug("%s: irq %d [hw 0x%x] server: 0x%x\n", __func__, d->irq,
>  		 hw_irq, irq_server);
>  
> -	status = rtas_call_reentrant(ibm_set_xive, 3, 1, NULL,
> -				     hw_irq, irq_server, xics_status[1]);
> +	status = rtas_call(ibm_set_xive, 3, 1, NULL,
> +			   hw_irq, irq_server, xics_status[1]);
>  
>  	if (status) {
>  		printk(KERN_ERR "%s: ibm,set-xive irq=%u returns %d\n",
> @@ -158,7 +158,7 @@ static int ics_rtas_check(struct ics *ics, unsigned int hw_irq)
>  		return -EINVAL;
>  
>  	/* Check if RTAS knows about this interrupt */
> -	rc = rtas_call_reentrant(ibm_get_xive, 1, 3, status, hw_irq);
> +	rc = rtas_call(ibm_get_xive, 1, 3, status, hw_irq);
>  	if (rc)
>  		return -ENXIO;
>  
> @@ -174,7 +174,7 @@ static long ics_rtas_get_server(struct ics *ics, unsigned long vec)
>  {
>  	int rc, status[2];
>  
> -	rc = rtas_call_reentrant(ibm_get_xive, 1, 3, status, vec);
> +	rc = rtas_call(ibm_get_xive, 1, 3, status, vec);
>  	if (rc)
>  		return -1;
>  	return status[0];


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Revert "powerpc/rtas: Implement reentrant rtas call"
       [not found] ` <1d76891ee052112ee1547a4027e358d5cbcac23d.camel@gmail.com>
@ 2022-09-09 14:04   ` Nathan Lynch
  2022-09-12 15:22     ` Leonardo Brás
  0 siblings, 1 reply; 15+ messages in thread
From: Nathan Lynch @ 2022-09-09 14:04 UTC (permalink / raw)
  To: Leonardo Brás; +Cc: linuxppc-dev

Hi Leonardo,

(restoring the list to the cc, hope that's ok)

Leonardo Brás <leobras.c@gmail.com> writes:
> On Wed, 2022-09-07 at 17:01 -0500, Nathan Lynch wrote:
>> At the time this was submitted by Leonardo, I confirmed -- or thought
>> I had confirmed -- with PowerVM partition firmware development that
>> the following RTAS functions:
>> 
>> - ibm,get-xive
>> - ibm,int-off
>> - ibm,int-on
>> - ibm,set-xive
>> 
>> were safe to call on multiple CPUs simultaneously, not only with
>> respect to themselves as indicated by PAPR, but with arbitrary other
>> RTAS calls:
>> 
>> https://lore.kernel.org/linuxppc-dev/875zcy2v8o.fsf@linux.ibm.com/
>> 
>> Recent discussion with firmware development makes it clear that this
>> is not true, and that the code in commit b664db8e3f97 ("powerpc/rtas:
>> Implement reentrant rtas call") is unsafe, likely explaining several
>> strange bugs we've seen in internal testing involving DLPAR and
>> LPM. These scenarios use ibm,configure-connector, whose internal state
>> can be corrupted by the concurrent use of the "reentrant" functions,
>> leading to symptoms like endless busy statuses from RTAS.
>
> Oh, does not it means PowerVM is not compliant to the PAPR specs?

No, it means the premise of commit b664db8e3f97 ("powerpc/rtas:
Implement reentrant rtas call") change is incorrect. The "reentrant"
property described in the spec applies only to the individual RTAS
functions. The OS can invoke (for example) ibm,set-xive on multiple CPUs
simultaneously, but it must adhere to the more general requirement to
serialize with other RTAS functions.

I don't claim that this is a useful property, at least not for
Linux. Maybe other OSes are better able to exploit it.

I apologize for my role in the confusion here. When reviewing the
original change, I talked to firmware development about the reentrant
property and how we wanted to use it. I've since reviewed that
conversation and concluded that I didn't communicate clearly, and that I
interpreted their responses too eagerly.

In the future, when we (pseries Linux developers at IBM) want to go
beyond the language of the spec, we probably should initiate an
architecture change to make the PAPR eventually align with our
implementation choices.

> I mentioned this in the original Commit, and it's still true to the last LoPAR:
>
> ###
> For "ibm,int-on", "ibm,int-off",ibm,get-xive" and  "ibm,set-xive".
>
> On LoPAPR Version 1.1 (March 24, 2016), from 7.3.10.1 to 7.3.10.4,
> items 2 and 3 say:
>
> 2 - For the PowerPC External Interrupt option: The * call must be
> reentrant to the number of processors on the platform.
> 3 - For the PowerPC External Interrupt option: The * argument call
> buffer for each simultaneous call must be physically unique.
>
> ### 
>
> Other than that, IIRC, this change was used to avoid issues that were happening
> on kdump/kexec: 
> If another thread was holding the rtas lock, kdump/kexec would get stuck waiting
> for the lock forever and never finish the process, often causing a bug
> reproduction's kdump to not get collected. 
>
> Is there any other fix for the above bug? 

Not that I'm aware of, but if this is a common failure mode for kdump,
then perhaps rtas_call() or related code should be made to ignore the
lock in the crash path, as a more general mitigation.

Then again, maybe it's not that urgent? Only XICS mode guests are
potentially affected. Do we even have this problem with
firmware-assisted dumps on PowerVM?

> Or is that a bug which is acceptable to have back, compared to the new
> one?

It was not acceptable to regress existing functions (DLPAR, LPM) in
exchange for making the crash path more robust. Reverting the change is
the correct tradeoff, unfortunately.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Revert "powerpc/rtas: Implement reentrant rtas call"
  2022-09-09 14:04   ` Nathan Lynch
@ 2022-09-12 15:22     ` Leonardo Brás
  2022-09-12 19:58       ` Nathan Lynch
  0 siblings, 1 reply; 15+ messages in thread
From: Leonardo Brás @ 2022-09-12 15:22 UTC (permalink / raw)
  To: Nathan Lynch; +Cc: linuxppc-dev

On Fri, 2022-09-09 at 09:04 -0500, Nathan Lynch wrote:
> Hi Leonardo,
> 
> (restoring the list to the cc, hope that's ok)
> 

Sure, thanks for doing that. 
I probably had some mail composer issue here.

> Leonardo Brás <leobras.c@gmail.com> writes:
> > On Wed, 2022-09-07 at 17:01 -0500, Nathan Lynch wrote:
> > > At the time this was submitted by Leonardo, I confirmed -- or thought
> > > I had confirmed -- with PowerVM partition firmware development that
> > > the following RTAS functions:
> > > 
> > > - ibm,get-xive
> > > - ibm,int-off
> > > - ibm,int-on
> > > - ibm,set-xive
> > > 
> > > were safe to call on multiple CPUs simultaneously, not only with
> > > respect to themselves as indicated by PAPR, but with arbitrary other
> > > RTAS calls:
> > > 
> > > https://lore.kernel.org/linuxppc-dev/875zcy2v8o.fsf@linux.ibm.com/
> > > 
> > > Recent discussion with firmware development makes it clear that this
> > > is not true, and that the code in commit b664db8e3f97 ("powerpc/rtas:
> > > Implement reentrant rtas call") is unsafe, likely explaining several
> > > strange bugs we've seen in internal testing involving DLPAR and
> > > LPM. These scenarios use ibm,configure-connector, whose internal state
> > > can be corrupted by the concurrent use of the "reentrant" functions,
> > > leading to symptoms like endless busy statuses from RTAS.
> > 
> > Oh, does not it means PowerVM is not compliant to the PAPR specs?
> 
> No, it means the premise of commit b664db8e3f97 ("powerpc/rtas:
> Implement reentrant rtas call") change is incorrect. The "reentrant"
> property described in the spec applies only to the individual RTAS
> functions. The OS can invoke (for example) ibm,set-xive on multiple CPUs
> simultaneously, but it must adhere to the more general requirement to
> serialize with other RTAS functions.
> 

I see. Thanks for explaining that part!
I agree: reentrant calls that way don't look as useful on Linux than I
previously thought.

OTOH, I think that instead of reverting the change, we could make use of the
correct information and fix the current implementation. (This could help when we
do the same rtas call in multiple cpus)

I have an idea of a patch to fix this. 
Do you think it would be ok if I sent that, to prospect being an alternative to
this reversion?


> I don't claim that this is a useful property, at least not for
> Linux. Maybe other OSes are better able to exploit it.
> 
> I apologize for my role in the confusion here. When reviewing the
> original change, I talked to firmware development about the reentrant
> property and how we wanted to use it. I've since reviewed that
> conversation and concluded that I didn't communicate clearly, and that I
> interpreted their responses too eagerly.

No problem. Thanks for even talking to firmware people during original change!


> 
> In the future, when we (pseries Linux developers at IBM) want to go
> beyond the language of the spec, we probably should initiate an
> architecture change to make the PAPR eventually align with our
> implementation choices.

Yeah, it makes sense.

> 
> > I mentioned this in the original Commit, and it's still true to the last LoPAR:
> > 
> > ###
> > For "ibm,int-on", "ibm,int-off",ibm,get-xive" and  "ibm,set-xive".
> > 
> > On LoPAPR Version 1.1 (March 24, 2016), from 7.3.10.1 to 7.3.10.4,
> > items 2 and 3 say:
> > 
> > 2 - For the PowerPC External Interrupt option: The * call must be
> > reentrant to the number of processors on the platform.
> > 3 - For the PowerPC External Interrupt option: The * argument call
> > buffer for each simultaneous call must be physically unique.
> > 
> > ### 
> > 
> > Other than that, IIRC, this change was used to avoid issues that were happening
> > on kdump/kexec: 
> > If another thread was holding the rtas lock, kdump/kexec would get stuck waiting
> > for the lock forever and never finish the process, often causing a bug
> > reproduction's kdump to not get collected. 
> > 
> > Is there any other fix for the above bug? 
> 
> Not that I'm aware of, but if this is a common failure mode for kdump,
> then perhaps rtas_call() or related code should be made to ignore the
> lock in the crash path, as a more general mitigation.

It was the first idea at the time (bust the rtas lock), but I thought it could
raise some issues. Reentrant rtas calls seemed like a better solution at the
time.

> 
> Then again, maybe it's not that urgent? Only XICS mode guests are
> potentially affected. Do we even have this problem with
> firmware-assisted dumps on PowerVM?

It's been a lot of time, I don't recall all the details on that.

> 
> > Or is that a bug which is acceptable to have back, compared to the new
> > one?
> 
> It was not acceptable to regress existing functions (DLPAR, LPM) in
> exchange for making the crash path more robust. Reverting the change is
> the correct tradeoff, unfortunately.

I was talking about the kdump bug being acceptable, which by the previous
comment it is, or can be solved in other ways (busting rtas lock, as previously
mentioned).

Thanks Nathan!

Best regars, 
Leo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Revert "powerpc/rtas: Implement reentrant rtas call"
  2022-09-12 15:22     ` Leonardo Brás
@ 2022-09-12 19:58       ` Nathan Lynch
  2022-09-13 17:39         ` Leonardo Brás
  0 siblings, 1 reply; 15+ messages in thread
From: Nathan Lynch @ 2022-09-12 19:58 UTC (permalink / raw)
  To: Leonardo Brás; +Cc: linuxppc-dev

Leonardo Brás <leobras.c@gmail.com> writes:
> On Fri, 2022-09-09 at 09:04 -0500, Nathan Lynch wrote:
>> Leonardo Brás <leobras.c@gmail.com> writes:
>> > On Wed, 2022-09-07 at 17:01 -0500, Nathan Lynch wrote:
>> > > At the time this was submitted by Leonardo, I confirmed -- or thought
>> > > I had confirmed -- with PowerVM partition firmware development that
>> > > the following RTAS functions:
>> > > 
>> > > - ibm,get-xive
>> > > - ibm,int-off
>> > > - ibm,int-on
>> > > - ibm,set-xive
>> > > 
>> > > were safe to call on multiple CPUs simultaneously, not only with
>> > > respect to themselves as indicated by PAPR, but with arbitrary other
>> > > RTAS calls:
>> > > 
>> > > https://lore.kernel.org/linuxppc-dev/875zcy2v8o.fsf@linux.ibm.com/
>> > > 
>> > > Recent discussion with firmware development makes it clear that this
>> > > is not true, and that the code in commit b664db8e3f97 ("powerpc/rtas:
>> > > Implement reentrant rtas call") is unsafe, likely explaining several
>> > > strange bugs we've seen in internal testing involving DLPAR and
>> > > LPM. These scenarios use ibm,configure-connector, whose internal state
>> > > can be corrupted by the concurrent use of the "reentrant" functions,
>> > > leading to symptoms like endless busy statuses from RTAS.
>> > 
>> > Oh, does not it means PowerVM is not compliant to the PAPR specs?
>> 
>> No, it means the premise of commit b664db8e3f97 ("powerpc/rtas:
>> Implement reentrant rtas call") change is incorrect. The "reentrant"
>> property described in the spec applies only to the individual RTAS
>> functions. The OS can invoke (for example) ibm,set-xive on multiple CPUs
>> simultaneously, but it must adhere to the more general requirement to
>> serialize with other RTAS functions.
>> 
>
> I see. Thanks for explaining that part!
> I agree: reentrant calls that way don't look as useful on Linux than I
> previously thought.
>
> OTOH, I think that instead of reverting the change, we could make use of the
> correct information and fix the current implementation. (This could help when we
> do the same rtas call in multiple cpus)

Hmm I'm happy to be mistaken here, but I doubt we ever really need to do
that. I'm not seeing the need.

> I have an idea of a patch to fix this. 
> Do you think it would be ok if I sent that, to prospect being an alternative to
> this reversion?

It is my preference, and I believe it is more common, to revert to the
well-understood prior state, imperfect as it may be. The revert can be
backported to -stable and distros while development and review of
another approach proceeds.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Revert "powerpc/rtas: Implement reentrant rtas call"
  2022-09-12 19:58       ` Nathan Lynch
@ 2022-09-13 17:39         ` Leonardo Brás
  2022-09-16  1:31           ` Nicholas Piggin
  0 siblings, 1 reply; 15+ messages in thread
From: Leonardo Brás @ 2022-09-13 17:39 UTC (permalink / raw)
  To: Nathan Lynch; +Cc: linuxppc-dev

On Mon, 2022-09-12 at 14:58 -0500, Nathan Lynch wrote:
> Leonardo Brás <leobras.c@gmail.com> writes:
> > On Fri, 2022-09-09 at 09:04 -0500, Nathan Lynch wrote:
> > > Leonardo Brás <leobras.c@gmail.com> writes:
> > > > On Wed, 2022-09-07 at 17:01 -0500, Nathan Lynch wrote:
> > > > > At the time this was submitted by Leonardo, I confirmed -- or thought
> > > > > I had confirmed -- with PowerVM partition firmware development that
> > > > > the following RTAS functions:
> > > > > 
> > > > > - ibm,get-xive
> > > > > - ibm,int-off
> > > > > - ibm,int-on
> > > > > - ibm,set-xive
> > > > > 
> > > > > were safe to call on multiple CPUs simultaneously, not only with
> > > > > respect to themselves as indicated by PAPR, but with arbitrary other
> > > > > RTAS calls:
> > > > > 
> > > > > https://lore.kernel.org/linuxppc-dev/875zcy2v8o.fsf@linux.ibm.com/
> > > > > 
> > > > > Recent discussion with firmware development makes it clear that this
> > > > > is not true, and that the code in commit b664db8e3f97 ("powerpc/rtas:
> > > > > Implement reentrant rtas call") is unsafe, likely explaining several
> > > > > strange bugs we've seen in internal testing involving DLPAR and
> > > > > LPM. These scenarios use ibm,configure-connector, whose internal state
> > > > > can be corrupted by the concurrent use of the "reentrant" functions,
> > > > > leading to symptoms like endless busy statuses from RTAS.
> > > > 
> > > > Oh, does not it means PowerVM is not compliant to the PAPR specs?
> > > 
> > > No, it means the premise of commit b664db8e3f97 ("powerpc/rtas:
> > > Implement reentrant rtas call") change is incorrect. The "reentrant"
> > > property described in the spec applies only to the individual RTAS
> > > functions. The OS can invoke (for example) ibm,set-xive on multiple CPUs
> > > simultaneously, but it must adhere to the more general requirement to
> > > serialize with other RTAS functions.
> > > 
> > 
> > I see. Thanks for explaining that part!
> > I agree: reentrant calls that way don't look as useful on Linux than I
> > previously thought.
> > 
> > OTOH, I think that instead of reverting the change, we could make use of the
> > correct information and fix the current implementation. (This could help when we
> > do the same rtas call in multiple cpus)
> 
> Hmm I'm happy to be mistaken here, but I doubt we ever really need to do
> that. I'm not seeing the need.
> 
> > I have an idea of a patch to fix this. 
> > Do you think it would be ok if I sent that, to prospect being an alternative to
> > this reversion?
> 
> It is my preference, and I believe it is more common, to revert to the
> well-understood prior state, imperfect as it may be. The revert can be
> backported to -stable and distros while development and review of
> another approach proceeds.

Ok then, as long as you are aware of the kdump bug, I'm good.

FWIW:
Reviewed-by: Leonardo Bras <leobras.c@gmail.com>


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Revert "powerpc/rtas: Implement reentrant rtas call"
  2022-09-13 17:39         ` Leonardo Brás
@ 2022-09-16  1:31           ` Nicholas Piggin
  2022-09-16 21:56             ` Nathan Lynch
  0 siblings, 1 reply; 15+ messages in thread
From: Nicholas Piggin @ 2022-09-16  1:31 UTC (permalink / raw)
  To: Leonardo Brás, Nathan Lynch; +Cc: linuxppc-dev

On Wed Sep 14, 2022 at 3:39 AM AEST, Leonardo Brás wrote:
> On Mon, 2022-09-12 at 14:58 -0500, Nathan Lynch wrote:
> > Leonardo Brás <leobras.c@gmail.com> writes:
> > > On Fri, 2022-09-09 at 09:04 -0500, Nathan Lynch wrote:
> > > > Leonardo Brás <leobras.c@gmail.com> writes:
> > > > > On Wed, 2022-09-07 at 17:01 -0500, Nathan Lynch wrote:
> > > > > > At the time this was submitted by Leonardo, I confirmed -- or thought
> > > > > > I had confirmed -- with PowerVM partition firmware development that
> > > > > > the following RTAS functions:
> > > > > > 
> > > > > > - ibm,get-xive
> > > > > > - ibm,int-off
> > > > > > - ibm,int-on
> > > > > > - ibm,set-xive
> > > > > > 
> > > > > > were safe to call on multiple CPUs simultaneously, not only with
> > > > > > respect to themselves as indicated by PAPR, but with arbitrary other
> > > > > > RTAS calls:
> > > > > > 
> > > > > > https://lore.kernel.org/linuxppc-dev/875zcy2v8o.fsf@linux.ibm.com/
> > > > > > 
> > > > > > Recent discussion with firmware development makes it clear that this
> > > > > > is not true, and that the code in commit b664db8e3f97 ("powerpc/rtas:
> > > > > > Implement reentrant rtas call") is unsafe, likely explaining several
> > > > > > strange bugs we've seen in internal testing involving DLPAR and
> > > > > > LPM. These scenarios use ibm,configure-connector, whose internal state
> > > > > > can be corrupted by the concurrent use of the "reentrant" functions,
> > > > > > leading to symptoms like endless busy statuses from RTAS.
> > > > > 
> > > > > Oh, does not it means PowerVM is not compliant to the PAPR specs?
> > > > 
> > > > No, it means the premise of commit b664db8e3f97 ("powerpc/rtas:
> > > > Implement reentrant rtas call") change is incorrect. The "reentrant"
> > > > property described in the spec applies only to the individual RTAS
> > > > functions. The OS can invoke (for example) ibm,set-xive on multiple CPUs
> > > > simultaneously, but it must adhere to the more general requirement to
> > > > serialize with other RTAS functions.
> > > > 
> > > 
> > > I see. Thanks for explaining that part!
> > > I agree: reentrant calls that way don't look as useful on Linux than I
> > > previously thought.
> > > 
> > > OTOH, I think that instead of reverting the change, we could make use of the
> > > correct information and fix the current implementation. (This could help when we
> > > do the same rtas call in multiple cpus)
> > 
> > Hmm I'm happy to be mistaken here, but I doubt we ever really need to do
> > that. I'm not seeing the need.
> > 
> > > I have an idea of a patch to fix this. 
> > > Do you think it would be ok if I sent that, to prospect being an alternative to
> > > this reversion?
> > 
> > It is my preference, and I believe it is more common, to revert to the
> > well-understood prior state, imperfect as it may be. The revert can be
> > backported to -stable and distros while development and review of
> > another approach proceeds.
>
> Ok then, as long as you are aware of the kdump bug, I'm good.
>
> FWIW:
> Reviewed-by: Leonardo Bras <leobras.c@gmail.com>

A shame. I guess a reader/writer lock would not be much help because
the crash is probably more likely to hit longer running rtas calls?

Alternative is just cheat and do this...?

Thanks,
Nick

diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index 693133972294..89728714a06e 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -26,6 +26,7 @@
 #include <linux/syscalls.h>
 #include <linux/of.h>
 #include <linux/of_fdt.h>
+#include <linux/panic.h>
 
 #include <asm/interrupt.h>
 #include <asm/rtas.h>
@@ -97,6 +98,19 @@ static unsigned long lock_rtas(void)
 {
        unsigned long flags;
 
+       if (atomic_read(&panic_cpu) == raw_smp_processor_id()) {
+               /*
+                * Crash in progress on this CPU. Other CPUs should be
+                * stopped by now, so skip the lock in case it was being
+                * held, and is now needed for crashing e.g., kexec
+                * (machine_kexec_mask_interrupts) requires rtas calls.
+                *
+                * It's possible this could have caused rtas state
breakage
+                * but the alternative is deadlock.
+                */
+               return 0;
+       }
+
        local_irq_save(flags);
        preempt_disable();
        arch_spin_lock(&rtas.lock);
@@ -105,6 +119,9 @@ static unsigned long lock_rtas(void)
 
 static void unlock_rtas(unsigned long flags)
 {
+       if (atomic_read(&panic_cpu) == raw_smp_processor_id())
+               return;
+
        arch_spin_unlock(&rtas.lock);
        local_irq_restore(flags);
        preempt_enable();


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] Revert "powerpc/rtas: Implement reentrant rtas call"
  2022-09-16  1:31           ` Nicholas Piggin
@ 2022-09-16 21:56             ` Nathan Lynch
  2022-09-19 13:51               ` Nathan Lynch
  2023-04-14 14:20               ` Michal Suchánek
  0 siblings, 2 replies; 15+ messages in thread
From: Nathan Lynch @ 2022-09-16 21:56 UTC (permalink / raw)
  To: Nicholas Piggin, Leonardo Brás; +Cc: linuxppc-dev

"Nicholas Piggin" <npiggin@gmail.com> writes:
> On Wed Sep 14, 2022 at 3:39 AM AEST, Leonardo Brás wrote:
>> On Mon, 2022-09-12 at 14:58 -0500, Nathan Lynch wrote:
>> > Leonardo Brás <leobras.c@gmail.com> writes:
>> > > On Fri, 2022-09-09 at 09:04 -0500, Nathan Lynch wrote:
>> > > > Leonardo Brás <leobras.c@gmail.com> writes:
>> > > > > On Wed, 2022-09-07 at 17:01 -0500, Nathan Lynch wrote:
>> > > > > > At the time this was submitted by Leonardo, I confirmed -- or thought
>> > > > > > I had confirmed -- with PowerVM partition firmware development that
>> > > > > > the following RTAS functions:
>> > > > > > 
>> > > > > > - ibm,get-xive
>> > > > > > - ibm,int-off
>> > > > > > - ibm,int-on
>> > > > > > - ibm,set-xive
>> > > > > > 
>> > > > > > were safe to call on multiple CPUs simultaneously, not only with
>> > > > > > respect to themselves as indicated by PAPR, but with arbitrary other
>> > > > > > RTAS calls:
>> > > > > > 
>> > > > > > https://lore.kernel.org/linuxppc-dev/875zcy2v8o.fsf@linux.ibm.com/
>> > > > > > 
>> > > > > > Recent discussion with firmware development makes it clear that this
>> > > > > > is not true, and that the code in commit b664db8e3f97 ("powerpc/rtas:
>> > > > > > Implement reentrant rtas call") is unsafe, likely explaining several
>> > > > > > strange bugs we've seen in internal testing involving DLPAR and
>> > > > > > LPM. These scenarios use ibm,configure-connector, whose internal state
>> > > > > > can be corrupted by the concurrent use of the "reentrant" functions,
>> > > > > > leading to symptoms like endless busy statuses from RTAS.
>> > > > > 
>> > > > > Oh, does not it means PowerVM is not compliant to the PAPR specs?
>> > > > 
>> > > > No, it means the premise of commit b664db8e3f97 ("powerpc/rtas:
>> > > > Implement reentrant rtas call") change is incorrect. The "reentrant"
>> > > > property described in the spec applies only to the individual RTAS
>> > > > functions. The OS can invoke (for example) ibm,set-xive on multiple CPUs
>> > > > simultaneously, but it must adhere to the more general requirement to
>> > > > serialize with other RTAS functions.
>> > > > 
>> > > 
>> > > I see. Thanks for explaining that part!
>> > > I agree: reentrant calls that way don't look as useful on Linux than I
>> > > previously thought.
>> > > 
>> > > OTOH, I think that instead of reverting the change, we could make use of the
>> > > correct information and fix the current implementation. (This could help when we
>> > > do the same rtas call in multiple cpus)
>> > 
>> > Hmm I'm happy to be mistaken here, but I doubt we ever really need to do
>> > that. I'm not seeing the need.
>> > 
>> > > I have an idea of a patch to fix this. 
>> > > Do you think it would be ok if I sent that, to prospect being an alternative to
>> > > this reversion?
>> > 
>> > It is my preference, and I believe it is more common, to revert to the
>> > well-understood prior state, imperfect as it may be. The revert can be
>> > backported to -stable and distros while development and review of
>> > another approach proceeds.
>>
>> Ok then, as long as you are aware of the kdump bug, I'm good.
>>
>> FWIW:
>> Reviewed-by: Leonardo Bras <leobras.c@gmail.com>
>
> A shame. I guess a reader/writer lock would not be much help because
> the crash is probably more likely to hit longer running rtas calls?
>
> Alternative is just cheat and do this...?
>
> Thanks,
> Nick
>
> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
> index 693133972294..89728714a06e 100644
> --- a/arch/powerpc/kernel/rtas.c
> +++ b/arch/powerpc/kernel/rtas.c
> @@ -26,6 +26,7 @@
>  #include <linux/syscalls.h>
>  #include <linux/of.h>
>  #include <linux/of_fdt.h>
> +#include <linux/panic.h>
>  
>  #include <asm/interrupt.h>
>  #include <asm/rtas.h>
> @@ -97,6 +98,19 @@ static unsigned long lock_rtas(void)
>  {
>         unsigned long flags;
>  
> +       if (atomic_read(&panic_cpu) == raw_smp_processor_id()) {
> +               /*
> +                * Crash in progress on this CPU. Other CPUs should be
> +                * stopped by now, so skip the lock in case it was being
> +                * held, and is now needed for crashing e.g., kexec
> +                * (machine_kexec_mask_interrupts) requires rtas calls.
> +                *
> +                * It's possible this could have caused rtas state
> breakage
> +                * but the alternative is deadlock.
> +                */
> +               return 0;
> +       }
> +
>         local_irq_save(flags);
>         preempt_disable();
>         arch_spin_lock(&rtas.lock);
> @@ -105,6 +119,9 @@ static unsigned long lock_rtas(void)
>  
>  static void unlock_rtas(unsigned long flags)
>  {
> +       if (atomic_read(&panic_cpu) == raw_smp_processor_id())
> +               return;
> +
>         arch_spin_unlock(&rtas.lock);
>         local_irq_restore(flags);
>         preempt_enable();

Looks correct.

I wonder - would it be worth making the panic path use a separate
"emergency" rtas_args buffer as well? If a CPU is actually "stuck" in
RTAS at panic time, then leaving rtas.args untouched might make the
ibm,int-off, ibm,set-xive, ibm,os-term, and any other RTAS calls we
incur on the panic path more likely to succeed.

Building on yours, something like (sorry, it's ugly):

diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index 693133972294..4865d26e7391 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -73,6 +73,8 @@ struct rtas_t rtas = {
 };
 EXPORT_SYMBOL(rtas);
 
+static struct rtas_args emergency_rtas_args;
+
 DEFINE_SPINLOCK(rtas_data_buf_lock);
 EXPORT_SYMBOL(rtas_data_buf_lock);
 
@@ -93,20 +95,24 @@ EXPORT_SYMBOL(rtas_flash_term_hook);
  * such as having the timebase stopped which would lockup with
  * normal locks and spinlock debugging enabled
  */
-static unsigned long lock_rtas(void)
+static struct rtas_args *lock_rtas(unsigned long *flags)
 {
-	unsigned long flags;
+	if (atomic_read(&panic_cpu) == raw_smp_processor_id())
+		return &emergency_rtas_args;
 
-	local_irq_save(flags);
+	local_irq_save(*flags);
 	preempt_disable();
 	arch_spin_lock(&rtas.lock);
-	return flags;
+	return &rtas.args;
 }
 
-static void unlock_rtas(unsigned long flags)
+static void unlock_rtas(struct rtas_args *args, unsigned long *flags)
 {
+	if (atomic_read(&panic_cpu) == raw_smp_processor_id())
+		return;
+
 	arch_spin_unlock(&rtas.lock);
-	local_irq_restore(flags);
+	local_irq_restore(*flags);
 	preempt_enable();
 }
 
@@ -117,14 +123,15 @@ static void unlock_rtas(unsigned long flags)
  */
 static void call_rtas_display_status(unsigned char c)
 {
-	unsigned long s;
+	struct rtas_args *args;
+	unsigned long flags;
 
 	if (!rtas.base)
 		return;
 
-	s = lock_rtas();
-	rtas_call_unlocked(&rtas.args, 10, 1, 1, NULL, c);
-	unlock_rtas(s);
+	args = lock_rtas(&flags);
+	rtas_call_unlocked(args, 10, 1, 1, NULL, c);
+	unlock_rtas(args, &flags);
 }
 
 static void call_rtas_display_status_delay(char c)
@@ -468,7 +475,7 @@ int rtas_call(int token, int nargs, int nret, int *outputs, ...)
 {
 	va_list list;
 	int i;
-	unsigned long s;
+	unsigned long flags;
 	struct rtas_args *rtas_args;
 	char *buff_copy = NULL;
 	int ret;
@@ -481,10 +488,7 @@ int rtas_call(int token, int nargs, int nret, int *outputs, ...)
 		return -1;
 	}
 
-	s = lock_rtas();
-
-	/* We use the global rtas args buffer */
-	rtas_args = &rtas.args;
+	rtas_args = lock_rtas(&flags);
 
 	va_start(list, outputs);
 	va_rtas_call_unlocked(rtas_args, token, nargs, nret, list);
@@ -500,7 +504,7 @@ int rtas_call(int token, int nargs, int nret, int *outputs, ...)
 			outputs[i] = be32_to_cpu(rtas_args->rets[i+1]);
 	ret = (nret > 0)? be32_to_cpu(rtas_args->rets[0]): 0;
 
-	unlock_rtas(s);
+	unlock_rtas(rtas_args, &flags);
 
 	if (buff_copy) {
 		log_error(buff_copy, ERR_TYPE_RTAS_LOG, 0);
@@ -1190,6 +1194,7 @@ static void __init rtas_syscall_filter_init(void)
 /* We assume to be passed big endian arguments */
 SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs)
 {
+	struct rtas_args *argsp;
 	struct rtas_args args;
 	unsigned long flags;
 	char *buff_copy, *errbuf = NULL;
@@ -1249,18 +1254,18 @@ SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs)
 
 	buff_copy = get_errorlog_buffer();
 
-	flags = lock_rtas();
+	argsp = lock_rtas(&flags);
 
-	rtas.args = args;
-	do_enter_rtas(__pa(&rtas.args));
-	args = rtas.args;
+	*argsp = args;
+	do_enter_rtas(__pa(argsp));
+	args = *argsp;
 
 	/* A -1 return code indicates that the last command couldn't
 	   be completed due to a hardware error. */
 	if (be32_to_cpu(args.rets[0]) == -1)
 		errbuf = __fetch_rtas_last_error(buff_copy);
 
-	unlock_rtas(flags);
+	unlock_rtas(argsp, &flags);
 
 	if (buff_copy) {
 		if (errbuf)

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] Revert "powerpc/rtas: Implement reentrant rtas call"
  2022-09-16 21:56             ` Nathan Lynch
@ 2022-09-19 13:51               ` Nathan Lynch
  2022-09-19 23:45                 ` Michael Ellerman
  2022-09-20  3:54                 ` Nicholas Piggin
  2023-04-14 14:20               ` Michal Suchánek
  1 sibling, 2 replies; 15+ messages in thread
From: Nathan Lynch @ 2022-09-19 13:51 UTC (permalink / raw)
  To: Nicholas Piggin, Leonardo Brás; +Cc: linuxppc-dev

Nathan Lynch <nathanl@linux.ibm.com> writes:
> "Nicholas Piggin" <npiggin@gmail.com> writes:
>> On Wed Sep 14, 2022 at 3:39 AM AEST, Leonardo Brás wrote:
>>> On Mon, 2022-09-12 at 14:58 -0500, Nathan Lynch wrote:
>>> > Leonardo Brás <leobras.c@gmail.com> writes:
>>> > > On Fri, 2022-09-09 at 09:04 -0500, Nathan Lynch wrote:
>>> > > > Leonardo Brás <leobras.c@gmail.com> writes:
>>> > > > > On Wed, 2022-09-07 at 17:01 -0500, Nathan Lynch wrote:
>>> > > > > > At the time this was submitted by Leonardo, I confirmed -- or thought
>>> > > > > > I had confirmed -- with PowerVM partition firmware development that
>>> > > > > > the following RTAS functions:
>>> > > > > > 
>>> > > > > > - ibm,get-xive
>>> > > > > > - ibm,int-off
>>> > > > > > - ibm,int-on
>>> > > > > > - ibm,set-xive
>>> > > > > > 
>>> > > > > > were safe to call on multiple CPUs simultaneously, not only with
>>> > > > > > respect to themselves as indicated by PAPR, but with arbitrary other
>>> > > > > > RTAS calls:
>>> > > > > > 
>>> > > > > > https://lore.kernel.org/linuxppc-dev/875zcy2v8o.fsf@linux.ibm.com/
>>> > > > > > 
>>> > > > > > Recent discussion with firmware development makes it clear that this
>>> > > > > > is not true, and that the code in commit b664db8e3f97 ("powerpc/rtas:
>>> > > > > > Implement reentrant rtas call") is unsafe, likely explaining several
>>> > > > > > strange bugs we've seen in internal testing involving DLPAR and
>>> > > > > > LPM. These scenarios use ibm,configure-connector, whose internal state
>>> > > > > > can be corrupted by the concurrent use of the "reentrant" functions,
>>> > > > > > leading to symptoms like endless busy statuses from RTAS.
>>> > > > > 
>>> > > > > Oh, does not it means PowerVM is not compliant to the PAPR specs?
>>> > > > 
>>> > > > No, it means the premise of commit b664db8e3f97 ("powerpc/rtas:
>>> > > > Implement reentrant rtas call") change is incorrect. The "reentrant"
>>> > > > property described in the spec applies only to the individual RTAS
>>> > > > functions. The OS can invoke (for example) ibm,set-xive on multiple CPUs
>>> > > > simultaneously, but it must adhere to the more general requirement to
>>> > > > serialize with other RTAS functions.
>>> > > > 
>>> > > 
>>> > > I see. Thanks for explaining that part!
>>> > > I agree: reentrant calls that way don't look as useful on Linux than I
>>> > > previously thought.
>>> > > 
>>> > > OTOH, I think that instead of reverting the change, we could make use of the
>>> > > correct information and fix the current implementation. (This could help when we
>>> > > do the same rtas call in multiple cpus)
>>> > 
>>> > Hmm I'm happy to be mistaken here, but I doubt we ever really need to do
>>> > that. I'm not seeing the need.
>>> > 
>>> > > I have an idea of a patch to fix this. 
>>> > > Do you think it would be ok if I sent that, to prospect being an alternative to
>>> > > this reversion?
>>> > 
>>> > It is my preference, and I believe it is more common, to revert to the
>>> > well-understood prior state, imperfect as it may be. The revert can be
>>> > backported to -stable and distros while development and review of
>>> > another approach proceeds.
>>>
>>> Ok then, as long as you are aware of the kdump bug, I'm good.
>>>
>>> FWIW:
>>> Reviewed-by: Leonardo Bras <leobras.c@gmail.com>
>>
>> A shame. I guess a reader/writer lock would not be much help because
>> the crash is probably more likely to hit longer running rtas calls?
>>
>> Alternative is just cheat and do this...?

[...]

>
> I wonder - would it be worth making the panic path use a separate
> "emergency" rtas_args buffer as well? If a CPU is actually "stuck" in
> RTAS at panic time, then leaving rtas.args untouched might make the
> ibm,int-off, ibm,set-xive, ibm,os-term, and any other RTAS calls we
> incur on the panic path more likely to succeed.

Regardless, I request that we proceed with the revert while the crash
path hardening gets sorted out. If I understand the motivation behind
commit b664db8e3f97 ("powerpc/rtas: Implement reentrant rtas call"),
then it looks like it was incomplete anyway? rtas_os_term() still takes
the lock when calling ibm,os-term.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Revert "powerpc/rtas: Implement reentrant rtas call"
  2022-09-19 13:51               ` Nathan Lynch
@ 2022-09-19 23:45                 ` Michael Ellerman
  2022-09-20  3:54                 ` Nicholas Piggin
  1 sibling, 0 replies; 15+ messages in thread
From: Michael Ellerman @ 2022-09-19 23:45 UTC (permalink / raw)
  To: Nathan Lynch, Nicholas Piggin, Leonardo Brás; +Cc: linuxppc-dev

Nathan Lynch <nathanl@linux.ibm.com> writes:
> Nathan Lynch <nathanl@linux.ibm.com> writes:
>> "Nicholas Piggin" <npiggin@gmail.com> writes:
>>> On Wed Sep 14, 2022 at 3:39 AM AEST, Leonardo Brás wrote:
>>>> On Mon, 2022-09-12 at 14:58 -0500, Nathan Lynch wrote:
>>>> > Leonardo Brás <leobras.c@gmail.com> writes:
>>>> > > On Fri, 2022-09-09 at 09:04 -0500, Nathan Lynch wrote:
>>>> > > > Leonardo Brás <leobras.c@gmail.com> writes:
>>>> > > > > On Wed, 2022-09-07 at 17:01 -0500, Nathan Lynch wrote:
>>>> > > > > > At the time this was submitted by Leonardo, I confirmed -- or thought
>>>> > > > > > I had confirmed -- with PowerVM partition firmware development that
>>>> > > > > > the following RTAS functions:
>>>> > > > > > 
>>>> > > > > > - ibm,get-xive
>>>> > > > > > - ibm,int-off
>>>> > > > > > - ibm,int-on
>>>> > > > > > - ibm,set-xive
>>>> > > > > > 
>>>> > > > > > were safe to call on multiple CPUs simultaneously, not only with
>>>> > > > > > respect to themselves as indicated by PAPR, but with arbitrary other
>>>> > > > > > RTAS calls:
>>>> > > > > > 
>>>> > > > > > https://lore.kernel.org/linuxppc-dev/875zcy2v8o.fsf@linux.ibm.com/
>>>> > > > > > 
>>>> > > > > > Recent discussion with firmware development makes it clear that this
>>>> > > > > > is not true, and that the code in commit b664db8e3f97 ("powerpc/rtas:
>>>> > > > > > Implement reentrant rtas call") is unsafe, likely explaining several
>>>> > > > > > strange bugs we've seen in internal testing involving DLPAR and
>>>> > > > > > LPM. These scenarios use ibm,configure-connector, whose internal state
>>>> > > > > > can be corrupted by the concurrent use of the "reentrant" functions,
>>>> > > > > > leading to symptoms like endless busy statuses from RTAS.
>>>> > > > > 
>>>> > > > > Oh, does not it means PowerVM is not compliant to the PAPR specs?
>>>> > > > 
>>>> > > > No, it means the premise of commit b664db8e3f97 ("powerpc/rtas:
>>>> > > > Implement reentrant rtas call") change is incorrect. The "reentrant"
>>>> > > > property described in the spec applies only to the individual RTAS
>>>> > > > functions. The OS can invoke (for example) ibm,set-xive on multiple CPUs
>>>> > > > simultaneously, but it must adhere to the more general requirement to
>>>> > > > serialize with other RTAS functions.
>>>> > > > 
>>>> > > 
>>>> > > I see. Thanks for explaining that part!
>>>> > > I agree: reentrant calls that way don't look as useful on Linux than I
>>>> > > previously thought.
>>>> > > 
>>>> > > OTOH, I think that instead of reverting the change, we could make use of the
>>>> > > correct information and fix the current implementation. (This could help when we
>>>> > > do the same rtas call in multiple cpus)
>>>> > 
>>>> > Hmm I'm happy to be mistaken here, but I doubt we ever really need to do
>>>> > that. I'm not seeing the need.
>>>> > 
>>>> > > I have an idea of a patch to fix this. 
>>>> > > Do you think it would be ok if I sent that, to prospect being an alternative to
>>>> > > this reversion?
>>>> > 
>>>> > It is my preference, and I believe it is more common, to revert to the
>>>> > well-understood prior state, imperfect as it may be. The revert can be
>>>> > backported to -stable and distros while development and review of
>>>> > another approach proceeds.
>>>>
>>>> Ok then, as long as you are aware of the kdump bug, I'm good.
>>>>
>>>> FWIW:
>>>> Reviewed-by: Leonardo Bras <leobras.c@gmail.com>
>>>
>>> A shame. I guess a reader/writer lock would not be much help because
>>> the crash is probably more likely to hit longer running rtas calls?
>>>
>>> Alternative is just cheat and do this...?
>
> [...]
>
>>
>> I wonder - would it be worth making the panic path use a separate
>> "emergency" rtas_args buffer as well? If a CPU is actually "stuck" in
>> RTAS at panic time, then leaving rtas.args untouched might make the
>> ibm,int-off, ibm,set-xive, ibm,os-term, and any other RTAS calls we
>> incur on the panic path more likely to succeed.
>
> Regardless, I request that we proceed with the revert while the crash
> path hardening gets sorted out.

It's in next, I just haven't sent out the mails yet because I'm
disorganised :)

> If I understand the motivation behind
> commit b664db8e3f97 ("powerpc/rtas: Implement reentrant rtas call"),
> then it looks like it was incomplete anyway? rtas_os_term() still takes
> the lock when calling ibm,os-term.

The original report was for kdump, which doesn't use rtas_os_term() AFAIK.

cheers

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Revert "powerpc/rtas: Implement reentrant rtas call"
  2022-09-19 13:51               ` Nathan Lynch
  2022-09-19 23:45                 ` Michael Ellerman
@ 2022-09-20  3:54                 ` Nicholas Piggin
  2022-09-21 15:54                   ` Nathan Lynch
  1 sibling, 1 reply; 15+ messages in thread
From: Nicholas Piggin @ 2022-09-20  3:54 UTC (permalink / raw)
  To: Nathan Lynch, Leonardo Brás; +Cc: linuxppc-dev

On Mon Sep 19, 2022 at 11:51 PM AEST, Nathan Lynch wrote:
> Nathan Lynch <nathanl@linux.ibm.com> writes:
> > "Nicholas Piggin" <npiggin@gmail.com> writes:
> >> On Wed Sep 14, 2022 at 3:39 AM AEST, Leonardo Brás wrote:
> >>> On Mon, 2022-09-12 at 14:58 -0500, Nathan Lynch wrote:
> >>> > Leonardo Brás <leobras.c@gmail.com> writes:
> >>> > > On Fri, 2022-09-09 at 09:04 -0500, Nathan Lynch wrote:
> >>> > > > Leonardo Brás <leobras.c@gmail.com> writes:
> >>> > > > > On Wed, 2022-09-07 at 17:01 -0500, Nathan Lynch wrote:
> >>> > > > > > At the time this was submitted by Leonardo, I confirmed -- or thought
> >>> > > > > > I had confirmed -- with PowerVM partition firmware development that
> >>> > > > > > the following RTAS functions:
> >>> > > > > > 
> >>> > > > > > - ibm,get-xive
> >>> > > > > > - ibm,int-off
> >>> > > > > > - ibm,int-on
> >>> > > > > > - ibm,set-xive
> >>> > > > > > 
> >>> > > > > > were safe to call on multiple CPUs simultaneously, not only with
> >>> > > > > > respect to themselves as indicated by PAPR, but with arbitrary other
> >>> > > > > > RTAS calls:
> >>> > > > > > 
> >>> > > > > > https://lore.kernel.org/linuxppc-dev/875zcy2v8o.fsf@linux.ibm.com/
> >>> > > > > > 
> >>> > > > > > Recent discussion with firmware development makes it clear that this
> >>> > > > > > is not true, and that the code in commit b664db8e3f97 ("powerpc/rtas:
> >>> > > > > > Implement reentrant rtas call") is unsafe, likely explaining several
> >>> > > > > > strange bugs we've seen in internal testing involving DLPAR and
> >>> > > > > > LPM. These scenarios use ibm,configure-connector, whose internal state
> >>> > > > > > can be corrupted by the concurrent use of the "reentrant" functions,
> >>> > > > > > leading to symptoms like endless busy statuses from RTAS.
> >>> > > > > 
> >>> > > > > Oh, does not it means PowerVM is not compliant to the PAPR specs?
> >>> > > > 
> >>> > > > No, it means the premise of commit b664db8e3f97 ("powerpc/rtas:
> >>> > > > Implement reentrant rtas call") change is incorrect. The "reentrant"
> >>> > > > property described in the spec applies only to the individual RTAS
> >>> > > > functions. The OS can invoke (for example) ibm,set-xive on multiple CPUs
> >>> > > > simultaneously, but it must adhere to the more general requirement to
> >>> > > > serialize with other RTAS functions.
> >>> > > > 
> >>> > > 
> >>> > > I see. Thanks for explaining that part!
> >>> > > I agree: reentrant calls that way don't look as useful on Linux than I
> >>> > > previously thought.
> >>> > > 
> >>> > > OTOH, I think that instead of reverting the change, we could make use of the
> >>> > > correct information and fix the current implementation. (This could help when we
> >>> > > do the same rtas call in multiple cpus)
> >>> > 
> >>> > Hmm I'm happy to be mistaken here, but I doubt we ever really need to do
> >>> > that. I'm not seeing the need.
> >>> > 
> >>> > > I have an idea of a patch to fix this. 
> >>> > > Do you think it would be ok if I sent that, to prospect being an alternative to
> >>> > > this reversion?
> >>> > 
> >>> > It is my preference, and I believe it is more common, to revert to the
> >>> > well-understood prior state, imperfect as it may be. The revert can be
> >>> > backported to -stable and distros while development and review of
> >>> > another approach proceeds.
> >>>
> >>> Ok then, as long as you are aware of the kdump bug, I'm good.
> >>>
> >>> FWIW:
> >>> Reviewed-by: Leonardo Bras <leobras.c@gmail.com>
> >>
> >> A shame. I guess a reader/writer lock would not be much help because
> >> the crash is probably more likely to hit longer running rtas calls?
> >>
> >> Alternative is just cheat and do this...?
>
> [...]
>
> >
> > I wonder - would it be worth making the panic path use a separate
> > "emergency" rtas_args buffer as well? If a CPU is actually "stuck" in
> > RTAS at panic time, then leaving rtas.args untouched might make the
> > ibm,int-off, ibm,set-xive, ibm,os-term, and any other RTAS calls we
> > incur on the panic path more likely to succeed.

Yeah I think that's probably not a bad idea. Not sure if you've got the
bandwidth to take on doing the patch but be my guest if you do :)
Otherwise we can file it in github issues.

> Regardless, I request that we proceed with the revert while the crash
> path hardening gets sorted out. If I understand the motivation behind
> commit b664db8e3f97 ("powerpc/rtas: Implement reentrant rtas call"),
> then it looks like it was incomplete anyway? rtas_os_term() still takes
> the lock when calling ibm,os-term.

Yeah agree a simple revert should be done first.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Revert "powerpc/rtas: Implement reentrant rtas call"
  2022-09-20  3:54                 ` Nicholas Piggin
@ 2022-09-21 15:54                   ` Nathan Lynch
  0 siblings, 0 replies; 15+ messages in thread
From: Nathan Lynch @ 2022-09-21 15:54 UTC (permalink / raw)
  To: Nicholas Piggin, Leonardo Brás; +Cc: linuxppc-dev

"Nicholas Piggin" <npiggin@gmail.com> writes:

> On Mon Sep 19, 2022 at 11:51 PM AEST, Nathan Lynch wrote:
>> > I wonder - would it be worth making the panic path use a separate
>> > "emergency" rtas_args buffer as well? If a CPU is actually "stuck" in
>> > RTAS at panic time, then leaving rtas.args untouched might make the
>> > ibm,int-off, ibm,set-xive, ibm,os-term, and any other RTAS calls we
>> > incur on the panic path more likely to succeed.
>
> Yeah I think that's probably not a bad idea. Not sure if you've got the
> bandwidth to take on doing the patch but be my guest if you do :)
> Otherwise we can file it in github issues.

Not sure I'll be able to work it soon. I filed
https://github.com/linuxppc/issues/issues/435

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Revert "powerpc/rtas: Implement reentrant rtas call"
  2022-09-07 22:01 [PATCH] Revert "powerpc/rtas: Implement reentrant rtas call" Nathan Lynch
  2022-09-08  7:56 ` Laurent Dufour
       [not found] ` <1d76891ee052112ee1547a4027e358d5cbcac23d.camel@gmail.com>
@ 2022-09-23 10:57 ` Michael Ellerman
  2 siblings, 0 replies; 15+ messages in thread
From: Michael Ellerman @ 2022-09-23 10:57 UTC (permalink / raw)
  To: Nathan Lynch, linuxppc-dev; +Cc: tyreld, ldufour, haren, leobras.c

On Wed, 7 Sep 2022 17:01:11 -0500, Nathan Lynch wrote:
> At the time this was submitted by Leonardo, I confirmed -- or thought
> I had confirmed -- with PowerVM partition firmware development that
> the following RTAS functions:
> 
> - ibm,get-xive
> - ibm,int-off
> - ibm,int-on
> - ibm,set-xive
> 
> [...]

Applied to powerpc/next.

[1/1] Revert "powerpc/rtas: Implement reentrant rtas call"
      https://git.kernel.org/powerpc/c/f88aabad33ea22be2ce1c60d8901942e4e2a9edb

cheers

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Revert "powerpc/rtas: Implement reentrant rtas call"
  2022-09-16 21:56             ` Nathan Lynch
  2022-09-19 13:51               ` Nathan Lynch
@ 2023-04-14 14:20               ` Michal Suchánek
  2023-04-17 13:55                 ` Nathan Lynch
  1 sibling, 1 reply; 15+ messages in thread
From: Michal Suchánek @ 2023-04-14 14:20 UTC (permalink / raw)
  To: Nathan Lynch; +Cc: Leonardo Brás, linuxppc-dev, Nicholas Piggin

Hello,

On Fri, Sep 16, 2022 at 04:56:18PM -0500, Nathan Lynch wrote:
> "Nicholas Piggin" <npiggin@gmail.com> writes:
> > On Wed Sep 14, 2022 at 3:39 AM AEST, Leonardo Brás wrote:
> >> On Mon, 2022-09-12 at 14:58 -0500, Nathan Lynch wrote:
> >> > Leonardo Brás <leobras.c@gmail.com> writes:
> >> > > On Fri, 2022-09-09 at 09:04 -0500, Nathan Lynch wrote:

> >> > > > No, it means the premise of commit b664db8e3f97 ("powerpc/rtas:
> >> > > > Implement reentrant rtas call") change is incorrect. The "reentrant"
> >> > > > property described in the spec applies only to the individual RTAS
> >> > > > functions. The OS can invoke (for example) ibm,set-xive on multiple CPUs
> >> > > > simultaneously, but it must adhere to the more general requirement to
> >> > > > serialize with other RTAS functions.
> >> > > > 
> >> > > 
> >> > > I see. Thanks for explaining that part!
> >> > > I agree: reentrant calls that way don't look as useful on Linux than I
> >> > > previously thought.
> >> > > 
> >> > > OTOH, I think that instead of reverting the change, we could make use of the
> >> > > correct information and fix the current implementation. (This could help when we
> >> > > do the same rtas call in multiple cpus)
> >> > 
> >> > Hmm I'm happy to be mistaken here, but I doubt we ever really need to do
> >> > that. I'm not seeing the need.
> >> > 
> >> > > I have an idea of a patch to fix this. 
> >> > > Do you think it would be ok if I sent that, to prospect being an alternative to
> >> > > this reversion?
> >> > 
> >> > It is my preference, and I believe it is more common, to revert to the
> >> > well-understood prior state, imperfect as it may be. The revert can be
> >> > backported to -stable and distros while development and review of
> >> > another approach proceeds.
> >>
> >> Ok then, as long as you are aware of the kdump bug, I'm good.
> >>
> >> FWIW:
> >> Reviewed-by: Leonardo Bras <leobras.c@gmail.com>
> >
> > A shame. I guess a reader/writer lock would not be much help because
> > the crash is probably more likely to hit longer running rtas calls?
> >
> > Alternative is just cheat and do this...?
> >
> > Thanks,
> > Nick
> >
> > diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
> > index 693133972294..89728714a06e 100644
> > --- a/arch/powerpc/kernel/rtas.c
> > +++ b/arch/powerpc/kernel/rtas.c
> > @@ -26,6 +26,7 @@
> >  #include <linux/syscalls.h>
> >  #include <linux/of.h>
> >  #include <linux/of_fdt.h>
> > +#include <linux/panic.h>
> >  
> >  #include <asm/interrupt.h>
> >  #include <asm/rtas.h>
> > @@ -97,6 +98,19 @@ static unsigned long lock_rtas(void)
> >  {
> >         unsigned long flags;
> >  
> > +       if (atomic_read(&panic_cpu) == raw_smp_processor_id()) {
> > +               /*
> > +                * Crash in progress on this CPU. Other CPUs should be
> > +                * stopped by now, so skip the lock in case it was being
> > +                * held, and is now needed for crashing e.g., kexec
> > +                * (machine_kexec_mask_interrupts) requires rtas calls.
> > +                *
> > +                * It's possible this could have caused rtas state
> > breakage
> > +                * but the alternative is deadlock.
> > +                */
> > +               return 0;
> > +       }
> > +
> >         local_irq_save(flags);
> >         preempt_disable();
> >         arch_spin_lock(&rtas.lock);
> > @@ -105,6 +119,9 @@ static unsigned long lock_rtas(void)
> >  
> >  static void unlock_rtas(unsigned long flags)
> >  {
> > +       if (atomic_read(&panic_cpu) == raw_smp_processor_id())
> > +               return;
> > +
> >         arch_spin_unlock(&rtas.lock);
> >         local_irq_restore(flags);
> >         preempt_enable();
> 
> Looks correct.
> 
> I wonder - would it be worth making the panic path use a separate
> "emergency" rtas_args buffer as well? If a CPU is actually "stuck" in
> RTAS at panic time, then leaving rtas.args untouched might make the
> ibm,int-off, ibm,set-xive, ibm,os-term, and any other RTAS calls we
> incur on the panic path more likely to succeed.

Was some fix for the case of crashing in rtas merged?

Looks like there is none unless I missed something.

The paramater area allocator might help with the latter
but the former does not seem addressed.

Thanks

Michal

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Revert "powerpc/rtas: Implement reentrant rtas call"
  2023-04-14 14:20               ` Michal Suchánek
@ 2023-04-17 13:55                 ` Nathan Lynch
  0 siblings, 0 replies; 15+ messages in thread
From: Nathan Lynch @ 2023-04-17 13:55 UTC (permalink / raw)
  To: Michal Suchánek; +Cc: Leonardo Brás, linuxppc-dev, Nicholas Piggin

Michal Suchánek <msuchanek@suse.de> writes:
> On Fri, Sep 16, 2022 at 04:56:18PM -0500, Nathan Lynch wrote:
>> "Nicholas Piggin" <npiggin@gmail.com> writes:
>> > On Wed Sep 14, 2022 at 3:39 AM AEST, Leonardo Brás wrote:
>> >> On Mon, 2022-09-12 at 14:58 -0500, Nathan Lynch wrote:
>> >> > Leonardo Brás <leobras.c@gmail.com> writes:
>> >> > > On Fri, 2022-09-09 at 09:04 -0500, Nathan Lynch wrote:
>
>> >> > > > No, it means the premise of commit b664db8e3f97 ("powerpc/rtas:
>> >> > > > Implement reentrant rtas call") change is incorrect. The "reentrant"
>> >> > > > property described in the spec applies only to the individual RTAS
>> >> > > > functions. The OS can invoke (for example) ibm,set-xive on multiple CPUs
>> >> > > > simultaneously, but it must adhere to the more general requirement to
>> >> > > > serialize with other RTAS functions.
>> >> > > > 
>> >> > > 
>> >> > > I see. Thanks for explaining that part!
>> >> > > I agree: reentrant calls that way don't look as useful on Linux than I
>> >> > > previously thought.
>> >> > > 
>> >> > > OTOH, I think that instead of reverting the change, we could make use of the
>> >> > > correct information and fix the current implementation. (This could help when we
>> >> > > do the same rtas call in multiple cpus)
>> >> > 
>> >> > Hmm I'm happy to be mistaken here, but I doubt we ever really need to do
>> >> > that. I'm not seeing the need.
>> >> > 
>> >> > > I have an idea of a patch to fix this. 
>> >> > > Do you think it would be ok if I sent that, to prospect being an alternative to
>> >> > > this reversion?
>> >> > 
>> >> > It is my preference, and I believe it is more common, to revert to the
>> >> > well-understood prior state, imperfect as it may be. The revert can be
>> >> > backported to -stable and distros while development and review of
>> >> > another approach proceeds.
>> >>
>> >> Ok then, as long as you are aware of the kdump bug, I'm good.
>> >>
>> >> FWIW:
>> >> Reviewed-by: Leonardo Bras <leobras.c@gmail.com>
>> >
>> > A shame. I guess a reader/writer lock would not be much help because
>> > the crash is probably more likely to hit longer running rtas calls?
>> >
>> > Alternative is just cheat and do this...?
>> >
>> > Thanks,
>> > Nick
>> >
>> > diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
>> > index 693133972294..89728714a06e 100644
>> > --- a/arch/powerpc/kernel/rtas.c
>> > +++ b/arch/powerpc/kernel/rtas.c
>> > @@ -26,6 +26,7 @@
>> >  #include <linux/syscalls.h>
>> >  #include <linux/of.h>
>> >  #include <linux/of_fdt.h>
>> > +#include <linux/panic.h>
>> >  
>> >  #include <asm/interrupt.h>
>> >  #include <asm/rtas.h>
>> > @@ -97,6 +98,19 @@ static unsigned long lock_rtas(void)
>> >  {
>> >         unsigned long flags;
>> >  
>> > +       if (atomic_read(&panic_cpu) == raw_smp_processor_id()) {
>> > +               /*
>> > +                * Crash in progress on this CPU. Other CPUs should be
>> > +                * stopped by now, so skip the lock in case it was being
>> > +                * held, and is now needed for crashing e.g., kexec
>> > +                * (machine_kexec_mask_interrupts) requires rtas calls.
>> > +                *
>> > +                * It's possible this could have caused rtas state
>> > breakage
>> > +                * but the alternative is deadlock.
>> > +                */
>> > +               return 0;
>> > +       }
>> > +
>> >         local_irq_save(flags);
>> >         preempt_disable();
>> >         arch_spin_lock(&rtas.lock);
>> > @@ -105,6 +119,9 @@ static unsigned long lock_rtas(void)
>> >  
>> >  static void unlock_rtas(unsigned long flags)
>> >  {
>> > +       if (atomic_read(&panic_cpu) == raw_smp_processor_id())
>> > +               return;
>> > +
>> >         arch_spin_unlock(&rtas.lock);
>> >         local_irq_restore(flags);
>> >         preempt_enable();
>> 
>> Looks correct.
>> 
>> I wonder - would it be worth making the panic path use a separate
>> "emergency" rtas_args buffer as well? If a CPU is actually "stuck" in
>> RTAS at panic time, then leaving rtas.args untouched might make the
>> ibm,int-off, ibm,set-xive, ibm,os-term, and any other RTAS calls we
>> incur on the panic path more likely to succeed.
>
> Was some fix for the case of crashing in rtas merged?
>
> Looks like there is none unless I missed something.

I'm not aware of any crashes in RTAS, but we do have an issue open to
track the issue I think you're referring to:

https://github.com/linuxppc/issues/issues/435

No progress yet. AFAIK only XICS guests are exposed; XIVE doesn't use
RTAS calls.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2023-04-17 13:56 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-07 22:01 [PATCH] Revert "powerpc/rtas: Implement reentrant rtas call" Nathan Lynch
2022-09-08  7:56 ` Laurent Dufour
     [not found] ` <1d76891ee052112ee1547a4027e358d5cbcac23d.camel@gmail.com>
2022-09-09 14:04   ` Nathan Lynch
2022-09-12 15:22     ` Leonardo Brás
2022-09-12 19:58       ` Nathan Lynch
2022-09-13 17:39         ` Leonardo Brás
2022-09-16  1:31           ` Nicholas Piggin
2022-09-16 21:56             ` Nathan Lynch
2022-09-19 13:51               ` Nathan Lynch
2022-09-19 23:45                 ` Michael Ellerman
2022-09-20  3:54                 ` Nicholas Piggin
2022-09-21 15:54                   ` Nathan Lynch
2023-04-14 14:20               ` Michal Suchánek
2023-04-17 13:55                 ` Nathan Lynch
2022-09-23 10:57 ` Michael Ellerman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).