All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] PSeries: Cancel RTAS event scan before firmware flash
@ 2011-09-21 10:29 Ravi K Nittala
  2011-09-23  0:38 ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 11+ messages in thread
From: Ravi K Nittala @ 2011-09-21 10:29 UTC (permalink / raw)
  To: benh; +Cc: linuxppc-dev

The RTAS firmware flash update is conducted using an RTAS call that is
serialized by lock_rtas() which uses spin_lock. While the flash is in
progress, rtasd performs scan for any RTAS events that are generated by
the system. rtasd keeps scanning for the RTAS events generated on the
machine. This is performed via workqueue mechanism. The rtas_event_scan()
also uses an RTAS call to scan the events, eventually trying to acquire
the spin_lock before issuing the request.

The flash update takes a while to complete and during this time, any other
RTAS call has to wait. In this case, rtas_event_scan() waits for a long time
on the spin_lock resulting in a soft lockup.

Fix: Just before the flash update is performed, the queued rtas_event_scan()
work item is cancelled from the work queue so that there is no other RTAS
call issued while the flash is in progress. After the flash completes, the
system reboots and the rtas_event_scan() is rescheduled.

Signed-off-by: Suzuki Poulose <suzuki@in.ibm.com>
Signed-off-by: Ravi Nittala <ravi.nittala@in.ibm.com>

---

 arch/powerpc/include/asm/rtas.h  |    4 ++++
 arch/powerpc/kernel/rtas_flash.c |    8 ++++++++
 arch/powerpc/kernel/rtasd.c      |    6 ++++++
 3 files changed, 18 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
index 58625d1..b5cbd9f 100644
--- a/arch/powerpc/include/asm/rtas.h
+++ b/arch/powerpc/include/asm/rtas.h
@@ -245,6 +245,10 @@ extern int early_init_dt_scan_rtas(unsigned long node,
 
 extern void pSeries_log_error(char *buf, unsigned int err_type, int fatal);
 
+#ifdef CONFIG_PPC_RTAS_DAEMON
+extern bool rtas_cancel_event_scan(void);
+#endif
+
 /* Error types logged.  */
 #define ERR_FLAG_ALREADY_LOGGED	0x0
 #define ERR_FLAG_BOOT		0x1 	/* log was pulled from NVRAM on boot */
diff --git a/arch/powerpc/kernel/rtas_flash.c b/arch/powerpc/kernel/rtas_flash.c
index e037c74..a9cceff 100644
--- a/arch/powerpc/kernel/rtas_flash.c
+++ b/arch/powerpc/kernel/rtas_flash.c
@@ -567,6 +567,14 @@ static void rtas_flash_firmware(int reboot_type)
 		return;
 	}
 
+#ifdef CONFIG_PPC_RTAS_DAEMON
+	/*
+	 * Just before starting the firmware flash, cancel the event scan work
+	 * to avoid any soft lockup issues.
+	 */
+	rtas_cancel_event_scan();
+#endif
+
 	/*
 	 * NOTE: the "first" block must be under 4GB, so we create
 	 * an entry with no data blocks in the reserved buffer in
diff --git a/arch/powerpc/kernel/rtasd.c b/arch/powerpc/kernel/rtasd.c
index 481ef06..e8f03fa 100644
--- a/arch/powerpc/kernel/rtasd.c
+++ b/arch/powerpc/kernel/rtasd.c
@@ -472,6 +472,12 @@ static void start_event_scan(void)
 				 &event_scan_work, event_scan_delay);
 }
 
+/* Cancel the rtas event scan work */
+bool rtas_cancel_event_scan(void)
+{
+	return cancel_delayed_work_sync(&event_scan_work);
+}
+
 static int __init rtas_init(void)
 {
 	struct proc_dir_entry *entry;

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] PSeries: Cancel RTAS event scan before firmware flash
  2011-09-21 10:29 [PATCH] PSeries: Cancel RTAS event scan before firmware flash Ravi K Nittala
@ 2011-09-23  0:38 ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 11+ messages in thread
From: Benjamin Herrenschmidt @ 2011-09-23  0:38 UTC (permalink / raw)
  To: Ravi K Nittala; +Cc: linuxppc-dev

On Wed, 2011-09-21 at 15:59 +0530, Ravi K Nittala wrote:
> The RTAS firmware flash update is conducted using an RTAS call that is
> serialized by lock_rtas() which uses spin_lock. While the flash is in
> progress, rtasd performs scan for any RTAS events that are generated by
> the system. rtasd keeps scanning for the RTAS events generated on the
> machine. This is performed via workqueue mechanism. The rtas_event_scan()
> also uses an RTAS call to scan the events, eventually trying to acquire
> the spin_lock before issuing therequest.

Better. However:

> diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
> index 58625d1..b5cbd9f 100644
> --- a/arch/powerpc/include/asm/rtas.h
> +++ b/arch/powerpc/include/asm/rtas.h
> @@ -245,6 +245,10 @@ extern int early_init_dt_scan_rtas(unsigned long node,
>  
>  extern void pSeries_log_error(char *buf, unsigned int err_type, int fatal);
>  
> +#ifdef CONFIG_PPC_RTAS_DAEMON
> +extern bool rtas_cancel_event_scan(void);
> +#endif

The extern as such doesn't need an ifdef... however, you could avoid
this one:

 .../...
>  
> +#ifdef CONFIG_PPC_RTAS_DAEMON
> +	/*
> +	 * Just before starting the firmware flash, cancel the event scan work
> +	 * to avoid any soft lockup issues.
> +	 */
> +	rtas_cancel_event_scan();
> +#endif
> +

Here, by having the header contain instead:

#ifdef CONFIG_PPC_RTAS_DAEMON
extern void rtas_cancel_event_scan(void);
#else
static inline void rtas_cancel_event_scan(void) { }
#endif
 
Also note that I removed the bool, it's not useful since you don't
test it anyway.

>  	 * NOTE: the "first" block must be under 4GB, so we create
>  	 * an entry with no data blocks in the reserved buffer in
> diff --git a/arch/powerpc/kernel/rtasd.c b/arch/powerpc/kernel/rtasd.c
> index 481ef06..e8f03fa 100644
> --- a/arch/powerpc/kernel/rtasd.c
> +++ b/arch/powerpc/kernel/rtasd.c
> @@ -472,6 +472,12 @@ static void start_event_scan(void)
>  				 &event_scan_work, event_scan_delay);
>  }
>  
> +/* Cancel the rtas event scan work */
> +bool rtas_cancel_event_scan(void)
> +{
> +	return cancel_delayed_work_sync(&event_scan_work);
> +}

Finally, the above is missing an EXPORT_SYMBOL_GPL() since rtas
flash can be a module.

Cheers,
Ben.

>  static int __init rtas_init(void)
>  {
>  	struct proc_dir_entry *entry;

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] PSeries: Cancel RTAS event scan before firmware flash
  2011-10-04  7:49 Ravi K Nittala
@ 2011-10-04  7:54 ` Subrata Modak
  0 siblings, 0 replies; 11+ messages in thread
From: Subrata Modak @ 2011-10-04  7:54 UTC (permalink / raw)
  To: Ravi K Nittala
  Cc: Anton Blanchard, naveedaus, Michael Neuling, Pavaman, suzuki,
	ranittal, Vishu, linuxppc-dev, Divya Vikas

Also,

On Tue, 2011-10-04 at 13:19 +0530, Ravi K Nittala wrote:
> The RTAS firmware flash update is conducted using an RTAS call that is
> serialized by lock_rtas() which uses spin_lock. While the flash is in
> progress, rtasd performs scan for any RTAS events that are generated by
> the system. rtasd keeps scanning for the RTAS events generated on the
> machine. This is performed via workqueue mechanism. The rtas_event_scan()
> also uses an RTAS call to scan the events, eventually trying to acquire
> the spin_lock before issuing the request.
> 
> The flash update takes a while to complete and during this time, any other
> RTAS call has to wait. In this case, rtas_event_scan() waits for a long time
> on the spin_lock resulting in a soft lockup.
> 
> Fix: Just before the flash update is performed, the queued rtas_event_scan()
> work item is cancelled from the work queue so that there is no other RTAS
> call issued while the flash is in progress. After the flash completes, the
> system reboots and the rtas_event_scan() is rescheduled.
> 
> Signed-off-by: Suzuki Poulose <suzuki@in.ibm.com>
> Signed-off-by: Ravi Nittala <ravi.nittala@in.ibm.com>

Reported-by: Divya Vikas <divya.vikas@in.ibm.com>

Regards--
Subrata

> 
> ---
>  arch/powerpc/include/asm/rtas.h  |    6 ++++++
>  arch/powerpc/kernel/rtas_flash.c |    6 ++++++
>  arch/powerpc/kernel/rtasd.c      |    7 +++++++
>  3 files changed, 19 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
> index 58625d1..754723b 100644
> --- a/arch/powerpc/include/asm/rtas.h
> +++ b/arch/powerpc/include/asm/rtas.h
> @@ -245,6 +245,12 @@ extern int early_init_dt_scan_rtas(unsigned long node,
> 
>  extern void pSeries_log_error(char *buf, unsigned int err_type, int fatal);
> 
> +#ifdef CONFIG_PPC_RTAS_DAEMON
> +extern void rtas_cancel_event_scan(void);
> +#else
> +static inline void rtas_cancel_event_scan(void) { }
> +#endif
> +
>  /* Error types logged.  */
>  #define ERR_FLAG_ALREADY_LOGGED	0x0
>  #define ERR_FLAG_BOOT		0x1 	/* log was pulled from NVRAM on boot */
> diff --git a/arch/powerpc/kernel/rtas_flash.c b/arch/powerpc/kernel/rtas_flash.c
> index e037c74..4174b4b 100644
> --- a/arch/powerpc/kernel/rtas_flash.c
> +++ b/arch/powerpc/kernel/rtas_flash.c
> @@ -568,6 +568,12 @@ static void rtas_flash_firmware(int reboot_type)
>  	}
> 
>  	/*
> +	 * Just before starting the firmware flash, cancel the event scan work
> +	 * to avoid any soft lockup issues.
> +	 */
> +	rtas_cancel_event_scan();
> +
> +	/*
>  	 * NOTE: the "first" block must be under 4GB, so we create
>  	 * an entry with no data blocks in the reserved buffer in
>  	 * the kernel data segment.
> diff --git a/arch/powerpc/kernel/rtasd.c b/arch/powerpc/kernel/rtasd.c
> index 481ef06..1045ff4 100644
> --- a/arch/powerpc/kernel/rtasd.c
> +++ b/arch/powerpc/kernel/rtasd.c
> @@ -472,6 +472,13 @@ static void start_event_scan(void)
>  				 &event_scan_work, event_scan_delay);
>  }
> 
> +/* Cancel the rtas event scan work */
> +void rtas_cancel_event_scan(void)
> +{
> +	cancel_delayed_work_sync(&event_scan_work);
> +}
> +EXPORT_SYMBOL_GPL(rtas_cancel_event_scan);
> +
>  static int __init rtas_init(void)
>  {
>  	struct proc_dir_entry *entry;
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH] PSeries: Cancel RTAS event scan before firmware flash
@ 2011-10-04  7:49 Ravi K Nittala
  2011-10-04  7:54 ` Subrata Modak
  0 siblings, 1 reply; 11+ messages in thread
From: Ravi K Nittala @ 2011-10-04  7:49 UTC (permalink / raw)
  To: benh
  Cc: Anton Blanchard, Subrata Modak, Michael Neuling, suzuki,
	ranittal, linuxppc-dev, Divya Vikas

The RTAS firmware flash update is conducted using an RTAS call that is
serialized by lock_rtas() which uses spin_lock. While the flash is in
progress, rtasd performs scan for any RTAS events that are generated by
the system. rtasd keeps scanning for the RTAS events generated on the
machine. This is performed via workqueue mechanism. The rtas_event_scan()
also uses an RTAS call to scan the events, eventually trying to acquire
the spin_lock before issuing the request.

The flash update takes a while to complete and during this time, any other
RTAS call has to wait. In this case, rtas_event_scan() waits for a long time
on the spin_lock resulting in a soft lockup.

Fix: Just before the flash update is performed, the queued rtas_event_scan()
work item is cancelled from the work queue so that there is no other RTAS
call issued while the flash is in progress. After the flash completes, the
system reboots and the rtas_event_scan() is rescheduled.

Signed-off-by: Suzuki Poulose <suzuki@in.ibm.com>
Signed-off-by: Ravi Nittala <ravi.nittala@in.ibm.com>

---
 arch/powerpc/include/asm/rtas.h  |    6 ++++++
 arch/powerpc/kernel/rtas_flash.c |    6 ++++++
 arch/powerpc/kernel/rtasd.c      |    7 +++++++
 3 files changed, 19 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
index 58625d1..754723b 100644
--- a/arch/powerpc/include/asm/rtas.h
+++ b/arch/powerpc/include/asm/rtas.h
@@ -245,6 +245,12 @@ extern int early_init_dt_scan_rtas(unsigned long node,
 
 extern void pSeries_log_error(char *buf, unsigned int err_type, int fatal);
 
+#ifdef CONFIG_PPC_RTAS_DAEMON
+extern void rtas_cancel_event_scan(void);
+#else
+static inline void rtas_cancel_event_scan(void) { }
+#endif
+
 /* Error types logged.  */
 #define ERR_FLAG_ALREADY_LOGGED	0x0
 #define ERR_FLAG_BOOT		0x1 	/* log was pulled from NVRAM on boot */
diff --git a/arch/powerpc/kernel/rtas_flash.c b/arch/powerpc/kernel/rtas_flash.c
index e037c74..4174b4b 100644
--- a/arch/powerpc/kernel/rtas_flash.c
+++ b/arch/powerpc/kernel/rtas_flash.c
@@ -568,6 +568,12 @@ static void rtas_flash_firmware(int reboot_type)
 	}
 
 	/*
+	 * Just before starting the firmware flash, cancel the event scan work
+	 * to avoid any soft lockup issues.
+	 */
+	rtas_cancel_event_scan();
+
+	/*
 	 * NOTE: the "first" block must be under 4GB, so we create
 	 * an entry with no data blocks in the reserved buffer in
 	 * the kernel data segment.
diff --git a/arch/powerpc/kernel/rtasd.c b/arch/powerpc/kernel/rtasd.c
index 481ef06..1045ff4 100644
--- a/arch/powerpc/kernel/rtasd.c
+++ b/arch/powerpc/kernel/rtasd.c
@@ -472,6 +472,13 @@ static void start_event_scan(void)
 				 &event_scan_work, event_scan_delay);
 }
 
+/* Cancel the rtas event scan work */
+void rtas_cancel_event_scan(void)
+{
+	cancel_delayed_work_sync(&event_scan_work);
+}
+EXPORT_SYMBOL_GPL(rtas_cancel_event_scan);
+
 static int __init rtas_init(void)
 {
 	struct proc_dir_entry *entry;

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] PSeries: Cancel RTAS event scan before firmware flash
  2011-08-30  6:21       ` Benjamin Herrenschmidt
@ 2011-08-30  6:57         ` Suzuki Poulose
  0 siblings, 0 replies; 11+ messages in thread
From: Suzuki Poulose @ 2011-08-30  6:57 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: mikey, Ravi K. Nittala, sbest, antonb, subrata.modak, ranittal,
	linuxppc-dev, divya.vikas

On 08/30/11 11:51, Benjamin Herrenschmidt wrote:
> On Tue, 2011-08-30 at 16:19 +1000, Benjamin Herrenschmidt wrote:
>> On Tue, 2011-08-30 at 11:47 +0530, Suzuki Poulose wrote:
>>>>
>>>
>>> The flash operation is performed in the reboot path at the very end.
>>> So, even if we restart the event scan, the thread may not be able to
>>> process
>>> the events. Hence we thought we would leave it stopped.
>>>
>>> Again, we do not have much expertise in deciding which is the best
>>> thing to do.
>>> We could resume the event scan, if you think that is needed.
>>>
>>> Thanks for the review.
>>
>> No that's ok, I'll merge the patch as-is then.
>
> Actually, please dbl check you get the dependencies right. The event
> scan stuff is only compiled if CONFIG_PPC_RTAS_DAEMON is set, but the
> rtas flash code depends on a different config option that can be set
> independently.
>
> So at the very least you need an ifdef to guard the cross-call

Thanks for catching this ! Will address this in the next version.

Thanks
Suzuki

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] PSeries: Cancel RTAS event scan before firmware flash
  2011-08-30  6:19     ` Benjamin Herrenschmidt
@ 2011-08-30  6:21       ` Benjamin Herrenschmidt
  2011-08-30  6:57         ` Suzuki Poulose
  0 siblings, 1 reply; 11+ messages in thread
From: Benjamin Herrenschmidt @ 2011-08-30  6:21 UTC (permalink / raw)
  To: Suzuki Poulose
  Cc: mikey, Ravi K. Nittala, sbest, antonb, subrata.modak, ranittal,
	linuxppc-dev, divya.vikas

On Tue, 2011-08-30 at 16:19 +1000, Benjamin Herrenschmidt wrote:
> On Tue, 2011-08-30 at 11:47 +0530, Suzuki Poulose wrote:
> > >
> > 
> > The flash operation is performed in the reboot path at the very end.
> > So, even if we restart the event scan, the thread may not be able to
> > process
> > the events. Hence we thought we would leave it stopped.
> > 
> > Again, we do not have much expertise in deciding which is the best
> > thing to do.
> > We could resume the event scan, if you think that is needed.
> > 
> > Thanks for the review. 
> 
> No that's ok, I'll merge the patch as-is then.

Actually, please dbl check you get the dependencies right. The event
scan stuff is only compiled if CONFIG_PPC_RTAS_DAEMON is set, but the
rtas flash code depends on a different config option that can be set
independently.

So at the very least you need an ifdef to guard the cross-call

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] PSeries: Cancel RTAS event scan before firmware flash
  2011-08-30  6:17   ` Suzuki Poulose
@ 2011-08-30  6:19     ` Benjamin Herrenschmidt
  2011-08-30  6:21       ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 11+ messages in thread
From: Benjamin Herrenschmidt @ 2011-08-30  6:19 UTC (permalink / raw)
  To: Suzuki Poulose
  Cc: mikey, Ravi K. Nittala, sbest, antonb, subrata.modak, ranittal,
	linuxppc-dev, divya.vikas

On Tue, 2011-08-30 at 11:47 +0530, Suzuki Poulose wrote:
> >
> 
> The flash operation is performed in the reboot path at the very end.
> So, even if we restart the event scan, the thread may not be able to
> process
> the events. Hence we thought we would leave it stopped.
> 
> Again, we do not have much expertise in deciding which is the best
> thing to do.
> We could resume the event scan, if you think that is needed.
> 
> Thanks for the review. 

No that's ok, I'll merge the patch as-is then.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] PSeries: Cancel RTAS event scan before firmware flash
  2011-08-30  6:03 ` Benjamin Herrenschmidt
@ 2011-08-30  6:17   ` Suzuki Poulose
  2011-08-30  6:19     ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 11+ messages in thread
From: Suzuki Poulose @ 2011-08-30  6:17 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: mikey, Ravi K. Nittala, sbest, antonb, subrata.modak, ranittal,
	linuxppc-dev, divya.vikas

On 08/30/11 11:33, Benjamin Herrenschmidt wrote:
> On Wed, 2011-07-27 at 17:39 +0530, Ravi K. Nittala wrote:
>> The firmware flash update is conducted using an RTAS call, that is serialized
>> by lock_rtas() which uses spin_lock. rtasd keeps scanning for the RTAS events
>> generated on the machine. This is performed via a delayed workqueue, invoking
>> an RTAS call to scan the events.
>>
>> The flash update takes a while to complete and during this time, any other
>> RTAS call has to wait. In this case, rtas_event_scan() waits for a long time
>> on the spin_lock resulting in a soft lockup.
>>
>> Approaches to fix the issue :
>>
>> Approach 1: Stop all the other CPUs before we start flashing the firmware.
>>
>> Before the rtas firmware update starts, all other CPUs should be stopped.
>> Which means no other CPU should be in lock_rtas(). We do not want other CPUs
>> execute while FW update is in progress and the system will be rebooted anyway
>> after the update.
>
> Shouldn't we resume the event scan after the flash ?
>

The flash operation is performed in the reboot path at the very end.
So, even if we restart the event scan, the thread may not be able to process
the events. Hence we thought we would leave it stopped.

Again, we do not have much expertise in deciding which is the best thing to do.
We could resume the event scan, if you think that is needed.

Thanks for the review.

Suzuki

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] PSeries: Cancel RTAS event scan before firmware flash
  2011-07-27 12:09 Ravi K. Nittala
  2011-08-08  6:46 ` Suzuki Poulose
@ 2011-08-30  6:03 ` Benjamin Herrenschmidt
  2011-08-30  6:17   ` Suzuki Poulose
  1 sibling, 1 reply; 11+ messages in thread
From: Benjamin Herrenschmidt @ 2011-08-30  6:03 UTC (permalink / raw)
  To: Ravi K. Nittala
  Cc: antonb, sbest, mikey, subrata.modak, suzuki, ranittal,
	linuxppc-dev, divya.vikas

On Wed, 2011-07-27 at 17:39 +0530, Ravi K. Nittala wrote:
> The firmware flash update is conducted using an RTAS call, that is serialized
> by lock_rtas() which uses spin_lock. rtasd keeps scanning for the RTAS events
> generated on the machine. This is performed via a delayed workqueue, invoking
> an RTAS call to scan the events.
> 
> The flash update takes a while to complete and during this time, any other
> RTAS call has to wait. In this case, rtas_event_scan() waits for a long time
> on the spin_lock resulting in a soft lockup.
> 
> Approaches to fix the issue :
> 
> Approach 1: Stop all the other CPUs before we start flashing the firmware.
> 
> Before the rtas firmware update starts, all other CPUs should be stopped.
> Which means no other CPU should be in lock_rtas(). We do not want other CPUs
> execute while FW update is in progress and the system will be rebooted anyway
> after the update.

Shouldn't we resume the event scan after the flash ?

Appart from that, no objection with the approach.

Cheers,
Ben.

> --- arch/powerpc/kernel/setup-common.c.orig    2011-07-01 22:41:12.952507971 -0400
> +++ arch/powerpc/kernel/setup-common.c    2011-07-01 22:48:31.182507915 -0400
> @@ -109,11 +109,12 @@ void machine_shutdown(void)
>   void machine_restart(char *cmd)
>   {
>       machine_shutdown();
> -    if (ppc_md.restart)
> -        ppc_md.restart(cmd);
>   #ifdef CONFIG_SMP
> -    smp_send_stop();
> +        smp_send_stop();
>   #endif
> +    if (ppc_md.restart)
> +        ppc_md.restart(cmd);
> +
>       printk(KERN_EMERG "System Halted, OK to turn off power\n");
>       local_irq_disable();
>       while (1) ;
> 
> Problems with this approach:
> Stopping the CPUs suddenly may cause other serious problems depending on what
> was running on them. Hence, this approach cannot be considered.
> 
> 
> Approach 2: Cancel the rtas_scan_event work before starting the firmware flash.
> 
> Just before the flash update is performed, the queued rtas_event_scan() work
> item is cancelled from the work queue so that there is no other RTAS call
> issued while the flash is in progress. After the flash completes, the system
> reboots and the rtas_event_scan() is rescheduled.
> 
> Approach 2 looks to be a better solution than Approach 1. Kindly let us know
> your thoughts. Patch attached.
> 
> 
> Signed-off-by: Suzuki Poulose <suzuki@in.ibm.com>
> Signed-off-by: Ravi Nittala <ravi.nittala@in.ibm.com>
> 
> 
> ---
>  arch/powerpc/include/asm/rtas.h  |    2 ++
>  arch/powerpc/kernel/rtas_flash.c |    6 ++++++
>  arch/powerpc/kernel/rtasd.c      |    6 ++++++
>  3 files changed, 14 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
> index 58625d1..3f26f87 100644
> --- a/arch/powerpc/include/asm/rtas.h
> +++ b/arch/powerpc/include/asm/rtas.h
> @@ -245,6 +245,8 @@ extern int early_init_dt_scan_rtas(unsigned long node,
>  
>  extern void pSeries_log_error(char *buf, unsigned int err_type, int fatal);
>  
> +extern bool rtas_cancel_event_scan(void);
> +
>  /* Error types logged.  */
>  #define ERR_FLAG_ALREADY_LOGGED	0x0
>  #define ERR_FLAG_BOOT		0x1 	/* log was pulled from NVRAM on boot */
> diff --git a/arch/powerpc/kernel/rtas_flash.c b/arch/powerpc/kernel/rtas_flash.c
> index e037c74..4174b4b 100644
> --- a/arch/powerpc/kernel/rtas_flash.c
> +++ b/arch/powerpc/kernel/rtas_flash.c
> @@ -568,6 +568,12 @@ static void rtas_flash_firmware(int reboot_type)
>  	}
>  
>  	/*
> +	 * Just before starting the firmware flash, cancel the event scan work
> +	 * to avoid any soft lockup issues.
> +	 */
> +	rtas_cancel_event_scan();
> +
> +	/*
>  	 * NOTE: the "first" block must be under 4GB, so we create
>  	 * an entry with no data blocks in the reserved buffer in
>  	 * the kernel data segment.
> diff --git a/arch/powerpc/kernel/rtasd.c b/arch/powerpc/kernel/rtasd.c
> index 481ef06..e8f03fa 100644
> --- a/arch/powerpc/kernel/rtasd.c
> +++ b/arch/powerpc/kernel/rtasd.c
> @@ -472,6 +472,12 @@ static void start_event_scan(void)
>  				 &event_scan_work, event_scan_delay);
>  }
>  
> +/* Cancel the rtas event scan work */
> +bool rtas_cancel_event_scan(void)
> +{
> +	return cancel_delayed_work_sync(&event_scan_work);
> +}
> +
>  static int __init rtas_init(void)
>  {
>  	struct proc_dir_entry *entry;
> 
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] PSeries: Cancel RTAS event scan before firmware flash
  2011-07-27 12:09 Ravi K. Nittala
@ 2011-08-08  6:46 ` Suzuki Poulose
  2011-08-30  6:03 ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 11+ messages in thread
From: Suzuki Poulose @ 2011-08-08  6:46 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: mikey, sbest, Ravi K. Nittala, antonb, subrata.modak, ranittal,
	linuxppc-dev, divya.vikas


On 07/27/11 17:39, Ravi K. Nittala wrote:
> The firmware flash update is conducted using an RTAS call, that is serialized
> by lock_rtas() which uses spin_lock. rtasd keeps scanning for the RTAS events
> generated on the machine. This is performed via a delayed workqueue, invoking
> an RTAS call to scan the events.
>
> The flash update takes a while to complete and during this time, any other
> RTAS call has to wait. In this case, rtas_event_scan() waits for a long time
> on the spin_lock resulting in a soft lockup.
>
> Approaches to fix the issue :
>
> Approach 1: Stop all the other CPUs before we start flashing the firmware.
>
> Before the rtas firmware update starts, all other CPUs should be stopped.
> Which means no other CPU should be in lock_rtas(). We do not want other CPUs
> execute while FW update is in progress and the system will be rebooted anyway
> after the update.
>
> --- arch/powerpc/kernel/setup-common.c.orig    2011-07-01 22:41:12.952507971 -0400
> +++ arch/powerpc/kernel/setup-common.c    2011-07-01 22:48:31.182507915 -0400
> @@ -109,11 +109,12 @@ void machine_shutdown(void)
>    void machine_restart(char *cmd)
>    {
>        machine_shutdown();
> -    if (ppc_md.restart)
> -        ppc_md.restart(cmd);
>    #ifdef CONFIG_SMP
> -    smp_send_stop();
> +        smp_send_stop();
>    #endif
> +    if (ppc_md.restart)
> +        ppc_md.restart(cmd);
> +
>        printk(KERN_EMERG "System Halted, OK to turn off power\n");
>        local_irq_disable();
>        while (1) ;
>
> Problems with this approach:
> Stopping the CPUs suddenly may cause other serious problems depending on what
> was running on them. Hence, this approach cannot be considered.
>
>
> Approach 2: Cancel the rtas_scan_event work before starting the firmware flash.
>
> Just before the flash update is performed, the queued rtas_event_scan() work
> item is cancelled from the work queue so that there is no other RTAS call
> issued while the flash is in progress. After the flash completes, the system
> reboots and the rtas_event_scan() is rescheduled.
>
> Approach 2 looks to be a better solution than Approach 1. Kindly let us know
> your thoughts. Patch attached.
>

Ben,

Could you please let us know your thoughts about the following patch ?

Thanks
Suzuki


>
> Signed-off-by: Suzuki Poulose<suzuki@in.ibm.com>
> Signed-off-by: Ravi Nittala<ravi.nittala@in.ibm.com>
>
>
> ---
>   arch/powerpc/include/asm/rtas.h  |    2 ++
>   arch/powerpc/kernel/rtas_flash.c |    6 ++++++
>   arch/powerpc/kernel/rtasd.c      |    6 ++++++
>   3 files changed, 14 insertions(+), 0 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
> index 58625d1..3f26f87 100644
> --- a/arch/powerpc/include/asm/rtas.h
> +++ b/arch/powerpc/include/asm/rtas.h
> @@ -245,6 +245,8 @@ extern int early_init_dt_scan_rtas(unsigned long node,
>
>   extern void pSeries_log_error(char *buf, unsigned int err_type, int fatal);
>
> +extern bool rtas_cancel_event_scan(void);
> +
>   /* Error types logged.  */
>   #define ERR_FLAG_ALREADY_LOGGED	0x0
>   #define ERR_FLAG_BOOT		0x1 	/* log was pulled from NVRAM on boot */
> diff --git a/arch/powerpc/kernel/rtas_flash.c b/arch/powerpc/kernel/rtas_flash.c
> index e037c74..4174b4b 100644
> --- a/arch/powerpc/kernel/rtas_flash.c
> +++ b/arch/powerpc/kernel/rtas_flash.c
> @@ -568,6 +568,12 @@ static void rtas_flash_firmware(int reboot_type)
>   	}
>
>   	/*
> +	 * Just before starting the firmware flash, cancel the event scan work
> +	 * to avoid any soft lockup issues.
> +	 */
> +	rtas_cancel_event_scan();
> +
> +	/*
>   	 * NOTE: the "first" block must be under 4GB, so we create
>   	 * an entry with no data blocks in the reserved buffer in
>   	 * the kernel data segment.
> diff --git a/arch/powerpc/kernel/rtasd.c b/arch/powerpc/kernel/rtasd.c
> index 481ef06..e8f03fa 100644
> --- a/arch/powerpc/kernel/rtasd.c
> +++ b/arch/powerpc/kernel/rtasd.c
> @@ -472,6 +472,12 @@ static void start_event_scan(void)
>   				&event_scan_work, event_scan_delay);
>   }
>
> +/* Cancel the rtas event scan work */
> +bool rtas_cancel_event_scan(void)
> +{
> +	return cancel_delayed_work_sync(&event_scan_work);
> +}
> +
>   static int __init rtas_init(void)
>   {
>   	struct proc_dir_entry *entry;
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH] PSeries: Cancel RTAS event scan before firmware flash
@ 2011-07-27 12:09 Ravi K. Nittala
  2011-08-08  6:46 ` Suzuki Poulose
  2011-08-30  6:03 ` Benjamin Herrenschmidt
  0 siblings, 2 replies; 11+ messages in thread
From: Ravi K. Nittala @ 2011-07-27 12:09 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: antonb, subrata.modak, mikey, sbest, suzuki, ranittal, divya.vikas

The firmware flash update is conducted using an RTAS call, that is serialized
by lock_rtas() which uses spin_lock. rtasd keeps scanning for the RTAS events
generated on the machine. This is performed via a delayed workqueue, invoking
an RTAS call to scan the events.

The flash update takes a while to complete and during this time, any other
RTAS call has to wait. In this case, rtas_event_scan() waits for a long time
on the spin_lock resulting in a soft lockup.

Approaches to fix the issue :

Approach 1: Stop all the other CPUs before we start flashing the firmware.

Before the rtas firmware update starts, all other CPUs should be stopped.
Which means no other CPU should be in lock_rtas(). We do not want other CPUs
execute while FW update is in progress and the system will be rebooted anyway
after the update.

--- arch/powerpc/kernel/setup-common.c.orig    2011-07-01 22:41:12.952507971 -0400
+++ arch/powerpc/kernel/setup-common.c    2011-07-01 22:48:31.182507915 -0400
@@ -109,11 +109,12 @@ void machine_shutdown(void)
  void machine_restart(char *cmd)
  {
      machine_shutdown();
-    if (ppc_md.restart)
-        ppc_md.restart(cmd);
  #ifdef CONFIG_SMP
-    smp_send_stop();
+        smp_send_stop();
  #endif
+    if (ppc_md.restart)
+        ppc_md.restart(cmd);
+
      printk(KERN_EMERG "System Halted, OK to turn off power\n");
      local_irq_disable();
      while (1) ;

Problems with this approach:
Stopping the CPUs suddenly may cause other serious problems depending on what
was running on them. Hence, this approach cannot be considered.


Approach 2: Cancel the rtas_scan_event work before starting the firmware flash.

Just before the flash update is performed, the queued rtas_event_scan() work
item is cancelled from the work queue so that there is no other RTAS call
issued while the flash is in progress. After the flash completes, the system
reboots and the rtas_event_scan() is rescheduled.

Approach 2 looks to be a better solution than Approach 1. Kindly let us know
your thoughts. Patch attached.


Signed-off-by: Suzuki Poulose <suzuki@in.ibm.com>
Signed-off-by: Ravi Nittala <ravi.nittala@in.ibm.com>


---
 arch/powerpc/include/asm/rtas.h  |    2 ++
 arch/powerpc/kernel/rtas_flash.c |    6 ++++++
 arch/powerpc/kernel/rtasd.c      |    6 ++++++
 3 files changed, 14 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
index 58625d1..3f26f87 100644
--- a/arch/powerpc/include/asm/rtas.h
+++ b/arch/powerpc/include/asm/rtas.h
@@ -245,6 +245,8 @@ extern int early_init_dt_scan_rtas(unsigned long node,
 
 extern void pSeries_log_error(char *buf, unsigned int err_type, int fatal);
 
+extern bool rtas_cancel_event_scan(void);
+
 /* Error types logged.  */
 #define ERR_FLAG_ALREADY_LOGGED	0x0
 #define ERR_FLAG_BOOT		0x1 	/* log was pulled from NVRAM on boot */
diff --git a/arch/powerpc/kernel/rtas_flash.c b/arch/powerpc/kernel/rtas_flash.c
index e037c74..4174b4b 100644
--- a/arch/powerpc/kernel/rtas_flash.c
+++ b/arch/powerpc/kernel/rtas_flash.c
@@ -568,6 +568,12 @@ static void rtas_flash_firmware(int reboot_type)
 	}
 
 	/*
+	 * Just before starting the firmware flash, cancel the event scan work
+	 * to avoid any soft lockup issues.
+	 */
+	rtas_cancel_event_scan();
+
+	/*
 	 * NOTE: the "first" block must be under 4GB, so we create
 	 * an entry with no data blocks in the reserved buffer in
 	 * the kernel data segment.
diff --git a/arch/powerpc/kernel/rtasd.c b/arch/powerpc/kernel/rtasd.c
index 481ef06..e8f03fa 100644
--- a/arch/powerpc/kernel/rtasd.c
+++ b/arch/powerpc/kernel/rtasd.c
@@ -472,6 +472,12 @@ static void start_event_scan(void)
 				 &event_scan_work, event_scan_delay);
 }
 
+/* Cancel the rtas event scan work */
+bool rtas_cancel_event_scan(void)
+{
+	return cancel_delayed_work_sync(&event_scan_work);
+}
+
 static int __init rtas_init(void)
 {
 	struct proc_dir_entry *entry;

^ permalink raw reply related	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2011-10-04  7:54 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-09-21 10:29 [PATCH] PSeries: Cancel RTAS event scan before firmware flash Ravi K Nittala
2011-09-23  0:38 ` Benjamin Herrenschmidt
  -- strict thread matches above, loose matches on Subject: below --
2011-10-04  7:49 Ravi K Nittala
2011-10-04  7:54 ` Subrata Modak
2011-07-27 12:09 Ravi K. Nittala
2011-08-08  6:46 ` Suzuki Poulose
2011-08-30  6:03 ` Benjamin Herrenschmidt
2011-08-30  6:17   ` Suzuki Poulose
2011-08-30  6:19     ` Benjamin Herrenschmidt
2011-08-30  6:21       ` Benjamin Herrenschmidt
2011-08-30  6:57         ` Suzuki Poulose

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.