[PATCH v2 0/2] powerpc/pseries: Improve serialization of PRRN events

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2 0/2] powerpc/pseries: Improve serialization of PRRN events
@ 2018-07-17 19:40 John Allen
  2018-07-17 19:40 ` [PATCH v2 1/2] powerpc/pseries: Avoid blocking rtas polling handling multiple " John Allen
  2018-07-17 19:40 ` [PATCH v2 2/2] powerpc/pseries: Wait for completion of hotplug events during PRRN handling John Allen
  0 siblings, 2 replies; 14+ messages in thread
From: John Allen @ 2018-07-17 19:40 UTC (permalink / raw)
  To: linuxppc-dev, nfont

Stress testing has uncovered issues with handling continuously queued PRRN
events. Running PRRN events in this way can seriously load the system given
the sheer volume of dlpar being handled. This patchset ensures that PRRN
events are handled more synchronously, only allowing the PRRN handler to
queue a single dlpar event at any given time.  Additionally, it ensures
that rtas polling continues normally when multiple PRRN events are queued
simultaneously.

v2:
  -Unlock prrn_lock when PRRN operations are complete, not after handler is
   scheduled.
  -Remove call to flush_work, the previous broken method of serializing
   PRRN events.

John Allen (2):
  powerpc/pseries: Avoid blocking rtas polling handling multiple PRRN
    events
  powerpc/pseries: Wait for completion of hotplug events during PRRN
    handling

 arch/powerpc/kernel/rtasd.c               | 10 +++++++---
 arch/powerpc/platforms/pseries/mobility.c |  5 ++++-
 2 files changed, 11 insertions(+), 4 deletions(-)

-- 
2.17.1

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v2 1/2] powerpc/pseries: Avoid blocking rtas polling handling multiple PRRN events
  2018-07-17 19:40 [PATCH v2 0/2] powerpc/pseries: Improve serialization of PRRN events John Allen
@ 2018-07-17 19:40 ` John Allen
  2018-07-20 16:12   ` Nathan Fontenot
  2018-07-23 13:27   ` Michael Ellerman
  2018-07-17 19:40 ` [PATCH v2 2/2] powerpc/pseries: Wait for completion of hotplug events during PRRN handling John Allen
  1 sibling, 2 replies; 14+ messages in thread
From: John Allen @ 2018-07-17 19:40 UTC (permalink / raw)
  To: linuxppc-dev, nfont; +Cc: John Allen

When a PRRN event is being handled and another PRRN event comes in, the
second event will block rtas polling waiting on the first to complete,
preventing any further rtas events from being handled. This can be
especially problematic in case that PRRN events are continuously being
queued in which case rtas polling gets indefinitely blocked completely.

This patch introduces a mutex that prevents any subsequent PRRN events from
running while there is a prrn event being handled, allowing rtas polling to
continue normally.

Signed-off-by: John Allen <jallen@linux.ibm.com>
---
v2:
  -Unlock prrn_lock when PRRN operations are complete, not after handler is
   scheduled.
  -Remove call to flush_work, the previous broken method of serializing
   PRRN events.
---
 arch/powerpc/kernel/rtasd.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/rtasd.c b/arch/powerpc/kernel/rtasd.c
index 44d66c33d59d..845fc5aec178 100644
--- a/arch/powerpc/kernel/rtasd.c
+++ b/arch/powerpc/kernel/rtasd.c
@@ -35,6 +35,8 @@
 
 static DEFINE_SPINLOCK(rtasd_log_lock);
 
+static DEFINE_MUTEX(prrn_lock);
+
 static DECLARE_WAIT_QUEUE_HEAD(rtas_log_wait);
 
 static char *rtas_log_buf;
@@ -284,15 +286,17 @@ static void prrn_work_fn(struct work_struct *work)
 	 */
 	pseries_devicetree_update(-prrn_update_scope);
 	numa_update_cpu_topology(false);
+	mutex_unlock(&prrn_lock);
 }
 
 static DECLARE_WORK(prrn_work, prrn_work_fn);
 
 static void prrn_schedule_update(u32 scope)
 {
-	flush_work(&prrn_work);
-	prrn_update_scope = scope;
-	schedule_work(&prrn_work);
+	if (mutex_trylock(&prrn_lock)) {
+		prrn_update_scope = scope;
+		schedule_work(&prrn_work);
+	}
 }
 
 static void handle_rtas_event(const struct rtas_error_log *log)
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 2/2] powerpc/pseries: Wait for completion of hotplug events during PRRN handling
  2018-07-17 19:40 [PATCH v2 0/2] powerpc/pseries: Improve serialization of PRRN events John Allen
  2018-07-17 19:40 ` [PATCH v2 1/2] powerpc/pseries: Avoid blocking rtas polling handling multiple " John Allen
@ 2018-07-17 19:40 ` John Allen
  2018-07-20 16:12   ` Nathan Fontenot
  2018-07-23 13:41   ` Michael Ellerman
  1 sibling, 2 replies; 14+ messages in thread
From: John Allen @ 2018-07-17 19:40 UTC (permalink / raw)
  To: linuxppc-dev, nfont; +Cc: John Allen

While handling PRRN events, the time to handle the actual hotplug events
dwarfs the time it takes to perform the device tree updates and queue the
hotplug events. In the case that PRRN events are being queued continuously,
hotplug events have been observed to be queued faster than the kernel can
actually handle them. This patch avoids the problem by waiting for a
hotplug request to complete before queueing more hotplug events.

Signed-off-by: John Allen <jallen@linux.ibm.com>
---
 arch/powerpc/platforms/pseries/mobility.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
index 8a8033a249c7..49930848fa78 100644
--- a/arch/powerpc/platforms/pseries/mobility.c
+++ b/arch/powerpc/platforms/pseries/mobility.c
@@ -242,6 +242,7 @@ static int add_dt_node(__be32 parent_phandle, __be32 drc_index)
 static void prrn_update_node(__be32 phandle)
 {
 	struct pseries_hp_errorlog *hp_elog;
+	struct completion hotplug_done;
 	struct device_node *dn;
 
 	/*
@@ -263,7 +264,9 @@ static void prrn_update_node(__be32 phandle)
 	hp_elog->id_type = PSERIES_HP_ELOG_ID_DRC_INDEX;
 	hp_elog->_drc_u.drc_index = phandle;
 
-	queue_hotplug_event(hp_elog, NULL, NULL);
+	init_completion(&hotplug_done);
+	queue_hotplug_event(hp_elog, &hotplug_done, NULL);
+	wait_for_completion(&hotplug_done);
 
 	kfree(hp_elog);
 }
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] powerpc/pseries: Avoid blocking rtas polling handling multiple PRRN events
  2018-07-17 19:40 ` [PATCH v2 1/2] powerpc/pseries: Avoid blocking rtas polling handling multiple " John Allen
@ 2018-07-20 16:12   ` Nathan Fontenot
  2018-07-23 13:27   ` Michael Ellerman
  1 sibling, 0 replies; 14+ messages in thread
From: Nathan Fontenot @ 2018-07-20 16:12 UTC (permalink / raw)
  To: John Allen, linuxppc-dev

On 07/17/2018 02:40 PM, John Allen wrote:
> When a PRRN event is being handled and another PRRN event comes in, the
> second event will block rtas polling waiting on the first to complete,
> preventing any further rtas events from being handled. This can be
> especially problematic in case that PRRN events are continuously being
> queued in which case rtas polling gets indefinitely blocked completely.
> 
> This patch introduces a mutex that prevents any subsequent PRRN events from
> running while there is a prrn event being handled, allowing rtas polling to
> continue normally.
> 
> Signed-off-by: John Allen <jallen@linux.ibm.com>

Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>

> ---
> v2:
>    -Unlock prrn_lock when PRRN operations are complete, not after handler is
>     scheduled.
>    -Remove call to flush_work, the previous broken method of serializing
>     PRRN events.
> ---
>   arch/powerpc/kernel/rtasd.c | 10 +++++++---
>   1 file changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/rtasd.c b/arch/powerpc/kernel/rtasd.c
> index 44d66c33d59d..845fc5aec178 100644
> --- a/arch/powerpc/kernel/rtasd.c
> +++ b/arch/powerpc/kernel/rtasd.c
> @@ -35,6 +35,8 @@
> 
>   static DEFINE_SPINLOCK(rtasd_log_lock);
> 
> +static DEFINE_MUTEX(prrn_lock);
> +
>   static DECLARE_WAIT_QUEUE_HEAD(rtas_log_wait);
> 
>   static char *rtas_log_buf;
> @@ -284,15 +286,17 @@ static void prrn_work_fn(struct work_struct *work)
>   	 */
>   	pseries_devicetree_update(-prrn_update_scope);
>   	numa_update_cpu_topology(false);
> +	mutex_unlock(&prrn_lock);
>   }
> 
>   static DECLARE_WORK(prrn_work, prrn_work_fn);
> 
>   static void prrn_schedule_update(u32 scope)
>   {
> -	flush_work(&prrn_work);
> -	prrn_update_scope = scope;
> -	schedule_work(&prrn_work);
> +	if (mutex_trylock(&prrn_lock)) {
> +		prrn_update_scope = scope;
> +		schedule_work(&prrn_work);
> +	}
>   }
> 
>   static void handle_rtas_event(const struct rtas_error_log *log)
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/2] powerpc/pseries: Wait for completion of hotplug events during PRRN handling
  2018-07-17 19:40 ` [PATCH v2 2/2] powerpc/pseries: Wait for completion of hotplug events during PRRN handling John Allen
@ 2018-07-20 16:12   ` Nathan Fontenot
  2018-07-23 13:41   ` Michael Ellerman
  1 sibling, 0 replies; 14+ messages in thread
From: Nathan Fontenot @ 2018-07-20 16:12 UTC (permalink / raw)
  To: John Allen, linuxppc-dev

On 07/17/2018 02:40 PM, John Allen wrote:
> While handling PRRN events, the time to handle the actual hotplug events
> dwarfs the time it takes to perform the device tree updates and queue the
> hotplug events. In the case that PRRN events are being queued continuously,
> hotplug events have been observed to be queued faster than the kernel can
> actually handle them. This patch avoids the problem by waiting for a
> hotplug request to complete before queueing more hotplug events.
> 
> Signed-off-by: John Allen <jallen@linux.ibm.com>

Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>

> ---
>   arch/powerpc/platforms/pseries/mobility.c | 5 ++++-
>   1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
> index 8a8033a249c7..49930848fa78 100644
> --- a/arch/powerpc/platforms/pseries/mobility.c
> +++ b/arch/powerpc/platforms/pseries/mobility.c
> @@ -242,6 +242,7 @@ static int add_dt_node(__be32 parent_phandle, __be32 drc_index)
>   static void prrn_update_node(__be32 phandle)
>   {
>   	struct pseries_hp_errorlog *hp_elog;
> +	struct completion hotplug_done;
>   	struct device_node *dn;
> 
>   	/*
> @@ -263,7 +264,9 @@ static void prrn_update_node(__be32 phandle)
>   	hp_elog->id_type = PSERIES_HP_ELOG_ID_DRC_INDEX;
>   	hp_elog->_drc_u.drc_index = phandle;
> 
> -	queue_hotplug_event(hp_elog, NULL, NULL);
> +	init_completion(&hotplug_done);
> +	queue_hotplug_event(hp_elog, &hotplug_done, NULL);
> +	wait_for_completion(&hotplug_done);
> 
>   	kfree(hp_elog);
>   }
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] powerpc/pseries: Avoid blocking rtas polling handling multiple PRRN events
  2018-07-17 19:40 ` [PATCH v2 1/2] powerpc/pseries: Avoid blocking rtas polling handling multiple " John Allen
  2018-07-20 16:12   ` Nathan Fontenot
@ 2018-07-23 13:27   ` Michael Ellerman
  2018-07-23 15:05     ` John Allen
  1 sibling, 1 reply; 14+ messages in thread
From: Michael Ellerman @ 2018-07-23 13:27 UTC (permalink / raw)
  To: John Allen, linuxppc-dev, nfont; +Cc: John Allen

Hi John,

I'm a bit puzzled by this one.

John Allen <jallen@linux.ibm.com> writes:
> When a PRRN event is being handled and another PRRN event comes in, the
> second event will block rtas polling waiting on the first to complete,
> preventing any further rtas events from being handled. This can be
> especially problematic in case that PRRN events are continuously being
> queued in which case rtas polling gets indefinitely blocked completely.
>
> This patch introduces a mutex that prevents any subsequent PRRN events from
> running while there is a prrn event being handled, allowing rtas polling to
> continue normally.
>
> Signed-off-by: John Allen <jallen@linux.ibm.com>
> ---
> v2:
>   -Unlock prrn_lock when PRRN operations are complete, not after handler is
>    scheduled.
>   -Remove call to flush_work, the previous broken method of serializing
>    PRRN events.
> ---
>  arch/powerpc/kernel/rtasd.c | 10 +++++++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/arch/powerpc/kernel/rtasd.c b/arch/powerpc/kernel/rtasd.c
> index 44d66c33d59d..845fc5aec178 100644
> --- a/arch/powerpc/kernel/rtasd.c
> +++ b/arch/powerpc/kernel/rtasd.c
> @@ -284,15 +286,17 @@ static void prrn_work_fn(struct work_struct *work)
>  	 */
>  	pseries_devicetree_update(-prrn_update_scope);
>  	numa_update_cpu_topology(false);
> +	mutex_unlock(&prrn_lock);
>  }
>  
>  static DECLARE_WORK(prrn_work, prrn_work_fn);
>  
>  static void prrn_schedule_update(u32 scope)
>  {
> -	flush_work(&prrn_work);

This seems like it's actually the core of the change. Previously we were
basically blocking on the flush before continuing.

> -	prrn_update_scope = scope;

I don't really understand the scope. With the old code we always ran the
work function once for call, now we potentially throw away the scope
value (if the try lock fails).

> -	schedule_work(&prrn_work);
> +	if (mutex_trylock(&prrn_lock)) {
> +		prrn_update_scope = scope;
> +		schedule_work(&prrn_work);
> +	}

Ignoring the scope, the addition of the mutex should not actually make
any difference. If you see the doco for schedule_work() it says:

 * This puts a job in the kernel-global workqueue if it was not already
 * queued and leaves it in the same position on the kernel-global
 * workqueue otherwise.


So the mutex basically implements that existing behaviour. But maybe the
scope is the issue? Like I said I don't really understand the scope
value.


So I guess I'm wondering if we just need to drop the flush_work() and
the rest is not required?

cheers

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/2] powerpc/pseries: Wait for completion of hotplug events during PRRN handling
  2018-07-17 19:40 ` [PATCH v2 2/2] powerpc/pseries: Wait for completion of hotplug events during PRRN handling John Allen
  2018-07-20 16:12   ` Nathan Fontenot
@ 2018-07-23 13:41   ` Michael Ellerman
  2018-07-23 15:22     ` John Allen
  1 sibling, 1 reply; 14+ messages in thread
From: Michael Ellerman @ 2018-07-23 13:41 UTC (permalink / raw)
  To: John Allen, linuxppc-dev, nfont; +Cc: John Allen

John Allen <jallen@linux.ibm.com> writes:

> While handling PRRN events, the time to handle the actual hotplug events
> dwarfs the time it takes to perform the device tree updates and queue the
> hotplug events. In the case that PRRN events are being queued continuously,
> hotplug events have been observed to be queued faster than the kernel can
> actually handle them. This patch avoids the problem by waiting for a
> hotplug request to complete before queueing more hotplug events.

So do we need the hotplug work queue at all? Can we just call
handle_dlpar_errorlog() directly?

Or are we using the work queue to serialise things? And if so would a
mutex be better?

It looks like prrn_update_node() is called via at least, prrn_work_fn()
and post_mobility_fixup().

The latter is called from migration_store(), which seems like it would
be harmless. But also from pseries_suspend_enable_irqs() which I'm less
clear on.

cheers

> diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
> index 8a8033a249c7..49930848fa78 100644
> --- a/arch/powerpc/platforms/pseries/mobility.c
> +++ b/arch/powerpc/platforms/pseries/mobility.c
> @@ -242,6 +242,7 @@ static int add_dt_node(__be32 parent_phandle, __be32 drc_index)
>  static void prrn_update_node(__be32 phandle)
>  {
>  	struct pseries_hp_errorlog *hp_elog;
> +	struct completion hotplug_done;
>  	struct device_node *dn;
>  
>  	/*
> @@ -263,7 +264,9 @@ static void prrn_update_node(__be32 phandle)
>  	hp_elog->id_type = PSERIES_HP_ELOG_ID_DRC_INDEX;
>  	hp_elog->_drc_u.drc_index = phandle;
>  
> -	queue_hotplug_event(hp_elog, NULL, NULL);
> +	init_completion(&hotplug_done);
> +	queue_hotplug_event(hp_elog, &hotplug_done, NULL);
> +	wait_for_completion(&hotplug_done);
>  
>  	kfree(hp_elog);
>  }
> -- 
> 2.17.1

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] powerpc/pseries: Avoid blocking rtas polling handling multiple PRRN events
  2018-07-23 13:27   ` Michael Ellerman
@ 2018-07-23 15:05     ` John Allen
  2018-08-01 13:02       ` Michael Ellerman
  0 siblings, 1 reply; 14+ messages in thread
From: John Allen @ 2018-07-23 15:05 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev, nfont

On Mon, Jul 23, 2018 at 11:27:56PM +1000, Michael Ellerman wrote:
>Hi John,
>
>I'm a bit puzzled by this one.
>
>John Allen <jallen@linux.ibm.com> writes:
>> When a PRRN event is being handled and another PRRN event comes in, the
>> second event will block rtas polling waiting on the first to complete,
>> preventing any further rtas events from being handled. This can be
>> especially problematic in case that PRRN events are continuously being
>> queued in which case rtas polling gets indefinitely blocked completely.
>>
>> This patch introduces a mutex that prevents any subsequent PRRN events from
>> running while there is a prrn event being handled, allowing rtas polling to
>> continue normally.
>>
>> Signed-off-by: John Allen <jallen@linux.ibm.com>
>> ---
>> v2:
>>   -Unlock prrn_lock when PRRN operations are complete, not after handler is
>>    scheduled.
>>   -Remove call to flush_work, the previous broken method of serializing
>>    PRRN events.
>> ---
>>  arch/powerpc/kernel/rtasd.c | 10 +++++++---
>>  1 file changed, 7 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/powerpc/kernel/rtasd.c b/arch/powerpc/kernel/rtasd.c
>> index 44d66c33d59d..845fc5aec178 100644
>> --- a/arch/powerpc/kernel/rtasd.c
>> +++ b/arch/powerpc/kernel/rtasd.c
>> @@ -284,15 +286,17 @@ static void prrn_work_fn(struct work_struct *work)
>>  	 */
>>  	pseries_devicetree_update(-prrn_update_scope);
>>  	numa_update_cpu_topology(false);
>> +	mutex_unlock(&prrn_lock);
>>  }
>>
>>  static DECLARE_WORK(prrn_work, prrn_work_fn);
>>
>>  static void prrn_schedule_update(u32 scope)
>>  {
>> -	flush_work(&prrn_work);
>
>This seems like it's actually the core of the change. Previously we were
>basically blocking on the flush before continuing.

The idea here is to replace the blocking flush_work with a non-blocking 
mutex. So rather than waiting on the running PRRN event to complete, we 
bail out since a PRRN event is already running. The situation this is 
meant to address is flooding the workqueue with PRRN events, which like 
the situation in patch 2/2, these can be queued up faster than they can 
actually be handled.

>
>> -	prrn_update_scope = scope;
>
>I don't really understand the scope. With the old code we always ran the
>work function once for call, now we potentially throw away the scope
>value (if the try lock fails).

So anytime we actually want to run with the scope (in the event the 
trylock succeeds), we schedule the work with the scope value set 
accordingly as seen in the code below. In the case that we actually 
don't want to run a PRRN event (if one is already running) we do throw 
away the scope and ignore the request entirely.

>
>> -	schedule_work(&prrn_work);
>> +	if (mutex_trylock(&prrn_lock)) {
>> +		prrn_update_scope = scope;
>> +		schedule_work(&prrn_work);
>> +	}
>
>Ignoring the scope, the addition of the mutex should not actually make
>any difference. If you see the doco for schedule_work() it says:
>
> * This puts a job in the kernel-global workqueue if it was not already
> * queued and leaves it in the same position on the kernel-global
> * workqueue otherwise.
>
>
>So the mutex basically implements that existing behaviour. But maybe the
>scope is the issue? Like I said I don't really understand the scope
>value.
>
>
>So I guess I'm wondering if we just need to drop the flush_work() and
>the rest is not required?

To sum up the above, the behavior without the mutex is not the same as 
with the mutex. Without the mutex, that means that anytime we get a PRRN 
event, it will get queued on the workqueue which can get flooded if PRRN 
events are queued continuously. With the mutex, only one PRRN event can 
be queued for handling at once.

Hope that clears things up!

-John

>
>cheers
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/2] powerpc/pseries: Wait for completion of hotplug events during PRRN handling
  2018-07-23 13:41   ` Michael Ellerman
@ 2018-07-23 15:22     ` John Allen
  2018-08-01 13:16       ` Michael Ellerman
  0 siblings, 1 reply; 14+ messages in thread
From: John Allen @ 2018-07-23 15:22 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev, nfont

On Mon, Jul 23, 2018 at 11:41:24PM +1000, Michael Ellerman wrote:
>John Allen <jallen@linux.ibm.com> writes:
>
>> While handling PRRN events, the time to handle the actual hotplug events
>> dwarfs the time it takes to perform the device tree updates and queue the
>> hotplug events. In the case that PRRN events are being queued continuously,
>> hotplug events have been observed to be queued faster than the kernel can
>> actually handle them. This patch avoids the problem by waiting for a
>> hotplug request to complete before queueing more hotplug events.
>
>So do we need the hotplug work queue at all? Can we just call
>handle_dlpar_errorlog() directly?
>
>Or are we using the work queue to serialise things? And if so would a
>mutex be better?

Right, the workqueue is meant to serialize all hotplug events and it 
gets used for more than just PRRN events. I believe the motivation for 
using the workqueue over a mutex is that KVM guests initiate hotplug 
events through the hotplug interrupt and can queue fairly large requests 
meaning that in this scenario, waiting for a lock would block interrupts
for a while. Using the workqueue allows us to serialize hotplug events 
from different sources in the same way without worrying about the 
context in which the event is generated.

>
>It looks like prrn_update_node() is called via at least, prrn_work_fn()
>and post_mobility_fixup().
>
>The latter is called from migration_store(), which seems like it would
>be harmless. But also from pseries_suspend_enable_irqs() which I'm less
>clear on.

Yeah, that doesn't seem to make sense based on the function name. Odd 
that prrn_update_node is being called from anywhere outside of handling 
PRRN events. Perhaps if other code paths are using the function, it 
needs a more generic name.

-John

>
>cheers
>
>> diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
>> index 8a8033a249c7..49930848fa78 100644
>> --- a/arch/powerpc/platforms/pseries/mobility.c
>> +++ b/arch/powerpc/platforms/pseries/mobility.c
>> @@ -242,6 +242,7 @@ static int add_dt_node(__be32 parent_phandle, __be32 drc_index)
>>  static void prrn_update_node(__be32 phandle)
>>  {
>>  	struct pseries_hp_errorlog *hp_elog;
>> +	struct completion hotplug_done;
>>  	struct device_node *dn;
>>
>>  	/*
>> @@ -263,7 +264,9 @@ static void prrn_update_node(__be32 phandle)
>>  	hp_elog->id_type = PSERIES_HP_ELOG_ID_DRC_INDEX;
>>  	hp_elog->_drc_u.drc_index = phandle;
>>
>> -	queue_hotplug_event(hp_elog, NULL, NULL);
>> +	init_completion(&hotplug_done);
>> +	queue_hotplug_event(hp_elog, &hotplug_done, NULL);
>> +	wait_for_completion(&hotplug_done);
>>
>>  	kfree(hp_elog);
>>  }
>> --
>> 2.17.1
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] powerpc/pseries: Avoid blocking rtas polling handling multiple PRRN events
  2018-07-23 15:05     ` John Allen
@ 2018-08-01 13:02       ` Michael Ellerman
  2018-08-06 19:09         ` John Allen
  0 siblings, 1 reply; 14+ messages in thread
From: Michael Ellerman @ 2018-08-01 13:02 UTC (permalink / raw)
  To: John Allen; +Cc: linuxppc-dev, nfont

Hi John,

I'm still not sure about this one.

John Allen <jallen@linux.ibm.com> writes:
> On Mon, Jul 23, 2018 at 11:27:56PM +1000, Michael Ellerman wrote:
>>Hi John,
>>
>>I'm a bit puzzled by this one.
>>
>>John Allen <jallen@linux.ibm.com> writes:
>>> When a PRRN event is being handled and another PRRN event comes in, the
>>> second event will block rtas polling waiting on the first to complete,
>>> preventing any further rtas events from being handled. This can be
>>> especially problematic in case that PRRN events are continuously being
>>> queued in which case rtas polling gets indefinitely blocked completely.
>>>
>>> This patch introduces a mutex that prevents any subsequent PRRN events from
>>> running while there is a prrn event being handled, allowing rtas polling to
>>> continue normally.
>>>
>>> Signed-off-by: John Allen <jallen@linux.ibm.com>
>>> ---
>>> v2:
>>>   -Unlock prrn_lock when PRRN operations are complete, not after handler is
>>>    scheduled.
>>>   -Remove call to flush_work, the previous broken method of serializing
>>>    PRRN events.
>>> ---
>>>  arch/powerpc/kernel/rtasd.c | 10 +++++++---
>>>  1 file changed, 7 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/arch/powerpc/kernel/rtasd.c b/arch/powerpc/kernel/rtasd.c
>>> index 44d66c33d59d..845fc5aec178 100644
>>> --- a/arch/powerpc/kernel/rtasd.c
>>> +++ b/arch/powerpc/kernel/rtasd.c
>>> @@ -284,15 +286,17 @@ static void prrn_work_fn(struct work_struct *work)
>>>  	 */
>>>  	pseries_devicetree_update(-prrn_update_scope);
>>>  	numa_update_cpu_topology(false);
>>> +	mutex_unlock(&prrn_lock);
>>>  }
>>>
>>>  static DECLARE_WORK(prrn_work, prrn_work_fn);
>>>
>>>  static void prrn_schedule_update(u32 scope)
>>>  {
>>> -	flush_work(&prrn_work);
>>
>>This seems like it's actually the core of the change. Previously we were
>>basically blocking on the flush before continuing.
>
> The idea here is to replace the blocking flush_work with a non-blocking 
> mutex. So rather than waiting on the running PRRN event to complete, we 
> bail out since a PRRN event is already running.

OK, but why is it OK to bail out?

The firmware sent you an error log asking you to do something, with a
scope value that has some meaning, and now you're just going to drop
that on the floor?

Maybe it is OK to just drop these events? Or maybe you're saying that
because the system is crashing under the load of too many events it's OK
to drop the events in this case.

> The situation this is 
> meant to address is flooding the workqueue with PRRN events, which like 
> the situation in patch 2/2, these can be queued up faster than they can 
> actually be handled.

I'm not really sure why this is a problem though.

The current code synchronously processes the events, so there should
only ever be one in flight.

I guess the issue is that each one can queue multiple events on the
hotplug work queue?

But still, we have terabytes of RAM, we should be able to queue a lot
of events before it becomes a problem.

So what exactly is getting flooded, what's the symptom?

If the queuing of the hotplug events is the problem, then why don't we
stop doing that? We could just process them synchronously from the PRRN
update, that would naturally throttle them.

cheers

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/2] powerpc/pseries: Wait for completion of hotplug events during PRRN handling
  2018-07-23 15:22     ` John Allen
@ 2018-08-01 13:16       ` Michael Ellerman
  2018-08-07 19:26         ` John Allen
  0 siblings, 1 reply; 14+ messages in thread
From: Michael Ellerman @ 2018-08-01 13:16 UTC (permalink / raw)
  To: John Allen; +Cc: linuxppc-dev, nfont

John Allen <jallen@linux.ibm.com> writes:

> On Mon, Jul 23, 2018 at 11:41:24PM +1000, Michael Ellerman wrote:
>>John Allen <jallen@linux.ibm.com> writes:
>>
>>> While handling PRRN events, the time to handle the actual hotplug events
>>> dwarfs the time it takes to perform the device tree updates and queue the
>>> hotplug events. In the case that PRRN events are being queued continuously,
>>> hotplug events have been observed to be queued faster than the kernel can
>>> actually handle them. This patch avoids the problem by waiting for a
>>> hotplug request to complete before queueing more hotplug events.

Have you tested this patch in isolation, ie. not with patch 1?

>>So do we need the hotplug work queue at all? Can we just call
>>handle_dlpar_errorlog() directly?
>>
>>Or are we using the work queue to serialise things? And if so would a
>>mutex be better?
>
> Right, the workqueue is meant to serialize all hotplug events and it 
> gets used for more than just PRRN events. I believe the motivation for 
> using the workqueue over a mutex is that KVM guests initiate hotplug 
> events through the hotplug interrupt and can queue fairly large requests 
> meaning that in this scenario, waiting for a lock would block interrupts
> for a while.

OK, but that just means that path needs to schedule work to run later.

> Using the workqueue allows us to serialize hotplug events 
> from different sources in the same way without worrying about the 
> context in which the event is generated.

A lock would be so much simpler.

It looks like we have three callers of queue_hotplug_event(), the dlpar
code, the mobility code and the ras interrupt.

The dlpar code already waits synchronously:

  init_completion(&hotplug_done);
  queue_hotplug_event(hp_elog, &hotplug_done, &rc);
  wait_for_completion(&hotplug_done);

You're changing mobility to do the same (this patch), leaving only the
ras interrupt that actually queues work and returns.


So it really seems like a mutex would do the trick, and the ras
interrupt would be the only case that needs to schedule work for later.

cheers

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] powerpc/pseries: Avoid blocking rtas polling handling multiple PRRN events
  2018-08-01 13:02       ` Michael Ellerman
@ 2018-08-06 19:09         ` John Allen
  0 siblings, 0 replies; 14+ messages in thread
From: John Allen @ 2018-08-06 19:09 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev, nfont

On Wed, Aug 01, 2018 at 11:02:59PM +1000, Michael Ellerman wrote:
>Hi John,
>
>I'm still not sure about this one.
>
>John Allen <jallen@linux.ibm.com> writes:
>> On Mon, Jul 23, 2018 at 11:27:56PM +1000, Michael Ellerman wrote:
>>>Hi John,
>>>
>>>I'm a bit puzzled by this one.
>>>
>>>John Allen <jallen@linux.ibm.com> writes:
>>>> When a PRRN event is being handled and another PRRN event comes in, the
>>>> second event will block rtas polling waiting on the first to complete,
>>>> preventing any further rtas events from being handled. This can be
>>>> especially problematic in case that PRRN events are continuously being
>>>> queued in which case rtas polling gets indefinitely blocked completely.
>>>>
>>>> This patch introduces a mutex that prevents any subsequent PRRN events from
>>>> running while there is a prrn event being handled, allowing rtas polling to
>>>> continue normally.
>>>>
>>>> Signed-off-by: John Allen <jallen@linux.ibm.com>
>>>> ---
>>>> v2:
>>>>   -Unlock prrn_lock when PRRN operations are complete, not after handler is
>>>>    scheduled.
>>>>   -Remove call to flush_work, the previous broken method of serializing
>>>>    PRRN events.
>>>> ---
>>>>  arch/powerpc/kernel/rtasd.c | 10 +++++++---
>>>>  1 file changed, 7 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/arch/powerpc/kernel/rtasd.c b/arch/powerpc/kernel/rtasd.c
>>>> index 44d66c33d59d..845fc5aec178 100644
>>>> --- a/arch/powerpc/kernel/rtasd.c
>>>> +++ b/arch/powerpc/kernel/rtasd.c
>>>> @@ -284,15 +286,17 @@ static void prrn_work_fn(struct work_struct *work)
>>>>  	 */
>>>>  	pseries_devicetree_update(-prrn_update_scope);
>>>>  	numa_update_cpu_topology(false);
>>>> +	mutex_unlock(&prrn_lock);
>>>>  }
>>>>
>>>>  static DECLARE_WORK(prrn_work, prrn_work_fn);
>>>>
>>>>  static void prrn_schedule_update(u32 scope)
>>>>  {
>>>> -	flush_work(&prrn_work);
>>>
>>>This seems like it's actually the core of the change. Previously we were
>>>basically blocking on the flush before continuing.
>>
>> The idea here is to replace the blocking flush_work with a non-blocking
>> mutex. So rather than waiting on the running PRRN event to complete, we
>> bail out since a PRRN event is already running.
>
>OK, but why is it OK to bail out?
>
>The firmware sent you an error log asking you to do something, with a
>scope value that has some meaning, and now you're just going to drop
>that on the floor?
>
>Maybe it is OK to just drop these events? Or maybe you're saying that
>because the system is crashing under the load of too many events it's OK
>to drop the events in this case.

I think I see your point. If a PRRN event comes in while another is 
currently running, the new one may contain a different list of LMBs/CPUs 
and the old list becomes outdated. With the mutex, the only event that 
gets handled is the oldest and we will lose any additional changes 
beyond the initial event. Therefore, as you mentioned in your previous 
message, the behavior of the global workqueue should work just fine once 
we remove the call to flush_work.  While a prrn event is running, only 
one will remain on the workqueue, then when the first one completes, the 
newly scheduled work function should grab the latest PRRN list.

I will send a new version of the patch with just the call to flush_work 
removed.

-John

>
>> The situation this is
>> meant to address is flooding the workqueue with PRRN events, which like
>> the situation in patch 2/2, these can be queued up faster than they can
>> actually be handled.
>
>I'm not really sure why this is a problem though.
>
>The current code synchronously processes the events, so there should
>only ever be one in flight.
>
>I guess the issue is that each one can queue multiple events on the
>hotplug work queue?
>
>But still, we have terabytes of RAM, we should be able to queue a lot
>of events before it becomes a problem.
>
>So what exactly is getting flooded, what's the symptom?
>
>If the queuing of the hotplug events is the problem, then why don't we
>stop doing that? We could just process them synchronously from the PRRN
>update, that would naturally throttle them.
>
>cheers
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/2] powerpc/pseries: Wait for completion of hotplug events during PRRN handling
  2018-08-01 13:16       ` Michael Ellerman
@ 2018-08-07 19:26         ` John Allen
  2018-08-08 13:38           ` Michael Ellerman
  0 siblings, 1 reply; 14+ messages in thread
From: John Allen @ 2018-08-07 19:26 UTC (permalink / raw)
  To: Michael Ellerman, nfont; +Cc: linuxppc-dev

On Wed, Aug 01, 2018 at 11:16:22PM +1000, Michael Ellerman wrote:
>John Allen <jallen@linux.ibm.com> writes:
>
>> On Mon, Jul 23, 2018 at 11:41:24PM +1000, Michael Ellerman wrote:
>>>John Allen <jallen@linux.ibm.com> writes:
>>>
>>>> While handling PRRN events, the time to handle the actual hotplug events
>>>> dwarfs the time it takes to perform the device tree updates and queue the
>>>> hotplug events. In the case that PRRN events are being queued continuously,
>>>> hotplug events have been observed to be queued faster than the kernel can
>>>> actually handle them. This patch avoids the problem by waiting for a
>>>> hotplug request to complete before queueing more hotplug events.
>
>Have you tested this patch in isolation, ie. not with patch 1?

While I was away on vacation, I believe a build was tested with just 
this patch and not the first and it has been running with no problems.  
However, I think they've had problems recreating the problem in general 
so it may just be that the environment is not setup properly to recreate 
the issue.

>
>>>So do we need the hotplug work queue at all? Can we just call
>>>handle_dlpar_errorlog() directly?
>>>
>>>Or are we using the work queue to serialise things? And if so would a
>>>mutex be better?
>>
>> Right, the workqueue is meant to serialize all hotplug events and it
>> gets used for more than just PRRN events. I believe the motivation for
>> using the workqueue over a mutex is that KVM guests initiate hotplug
>> events through the hotplug interrupt and can queue fairly large requests
>> meaning that in this scenario, waiting for a lock would block interrupts
>> for a while.
>
>OK, but that just means that path needs to schedule work to run later.
>
>> Using the workqueue allows us to serialize hotplug events
>> from different sources in the same way without worrying about the
>> context in which the event is generated.
>
>A lock would be so much simpler.
>
>It looks like we have three callers of queue_hotplug_event(), the dlpar
>code, the mobility code and the ras interrupt.
>
>The dlpar code already waits synchronously:
>
>  init_completion(&hotplug_done);
>  queue_hotplug_event(hp_elog, &hotplug_done, &rc);
>  wait_for_completion(&hotplug_done);
>
>You're changing mobility to do the same (this patch), leaving only the
>ras interrupt that actually queues work and returns.
>
>
>So it really seems like a mutex would do the trick, and the ras
>interrupt would be the only case that needs to schedule work for later.

I think you may be right, but I would need some feedback from Nathan 
Fontenot before I redesign the queue. He's been thinking about that 
design for longer than I have and may know something that I don't 
regarding the reason we're using a workqueue rather than a mutex.

Given that the bug this is meant to address is pretty high priority, 
would you consider the wait_for_completion an acceptable stopgap while a 
more substantial redesign of this code is discussed?

-John

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/2] powerpc/pseries: Wait for completion of hotplug events during PRRN handling
  2018-08-07 19:26         ` John Allen
@ 2018-08-08 13:38           ` Michael Ellerman
  0 siblings, 0 replies; 14+ messages in thread
From: Michael Ellerman @ 2018-08-08 13:38 UTC (permalink / raw)
  To: John Allen, nfont; +Cc: linuxppc-dev

John Allen <jallen@linux.ibm.com> writes:
> On Wed, Aug 01, 2018 at 11:16:22PM +1000, Michael Ellerman wrote:
>>John Allen <jallen@linux.ibm.com> writes:
>>> On Mon, Jul 23, 2018 at 11:41:24PM +1000, Michael Ellerman wrote:
>>>>John Allen <jallen@linux.ibm.com> writes:
>>>>
>>>>> While handling PRRN events, the time to handle the actual hotplug events
>>>>> dwarfs the time it takes to perform the device tree updates and queue the
>>>>> hotplug events. In the case that PRRN events are being queued continuously,
>>>>> hotplug events have been observed to be queued faster than the kernel can
>>>>> actually handle them. This patch avoids the problem by waiting for a
>>>>> hotplug request to complete before queueing more hotplug events.
>>
>>Have you tested this patch in isolation, ie. not with patch 1?
>
> While I was away on vacation, I believe a build was tested with just 
> this patch and not the first and it has been running with no problems.  
> However, I think they've had problems recreating the problem in general 
> so it may just be that the environment is not setup properly to recreate 
> the issue.

Yes I asked Haren to test it :)

>From memory there were some warnings still about the work queue being
blocked for long periods, but they weren't fatal.

>>>>So do we need the hotplug work queue at all? Can we just call
>>>>handle_dlpar_errorlog() directly?
>>>>
>>>>Or are we using the work queue to serialise things? And if so would a
>>>>mutex be better?
>>>
>>> Right, the workqueue is meant to serialize all hotplug events and it
>>> gets used for more than just PRRN events. I believe the motivation for
>>> using the workqueue over a mutex is that KVM guests initiate hotplug
>>> events through the hotplug interrupt and can queue fairly large requests
>>> meaning that in this scenario, waiting for a lock would block interrupts
>>> for a while.
>>
>>OK, but that just means that path needs to schedule work to run later.
>>
>>> Using the workqueue allows us to serialize hotplug events
>>> from different sources in the same way without worrying about the
>>> context in which the event is generated.
>>
>>A lock would be so much simpler.
>>
>>It looks like we have three callers of queue_hotplug_event(), the dlpar
>>code, the mobility code and the ras interrupt.
>>
>>The dlpar code already waits synchronously:
>>
>>  init_completion(&hotplug_done);
>>  queue_hotplug_event(hp_elog, &hotplug_done, &rc);
>>  wait_for_completion(&hotplug_done);
>>
>>You're changing mobility to do the same (this patch), leaving only the
>>ras interrupt that actually queues work and returns.
>>
>>
>>So it really seems like a mutex would do the trick, and the ras
>>interrupt would be the only case that needs to schedule work for later.
>
> I think you may be right, but I would need some feedback from Nathan 
> Fontenot before I redesign the queue. He's been thinking about that 
> design for longer than I have and may know something that I don't 
> regarding the reason we're using a workqueue rather than a mutex.

> Given that the bug this is meant to address is pretty high priority, 
> would you consider the wait_for_completion an acceptable stopgap while a 
> more substantial redesign of this code is discussed?

Yeah that would be OK.

cheers

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2018-08-08 13:38 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-17 19:40 [PATCH v2 0/2] powerpc/pseries: Improve serialization of PRRN events John Allen
2018-07-17 19:40 ` [PATCH v2 1/2] powerpc/pseries: Avoid blocking rtas polling handling multiple " John Allen
2018-07-20 16:12   ` Nathan Fontenot
2018-07-23 13:27   ` Michael Ellerman
2018-07-23 15:05     ` John Allen
2018-08-01 13:02       ` Michael Ellerman
2018-08-06 19:09         ` John Allen
2018-07-17 19:40 ` [PATCH v2 2/2] powerpc/pseries: Wait for completion of hotplug events during PRRN handling John Allen
2018-07-20 16:12   ` Nathan Fontenot
2018-07-23 13:41   ` Michael Ellerman
2018-07-23 15:22     ` John Allen
2018-08-01 13:16       ` Michael Ellerman
2018-08-07 19:26         ` John Allen
2018-08-08 13:38           ` Michael Ellerman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.