All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] powerpc/pseries: Ratelimit EPOW event warnings
@ 2015-06-02  5:18 Kamalesh Babulal
  2015-06-24 19:18 ` Vipin K Parashar
  0 siblings, 1 reply; 4+ messages in thread
From: Kamalesh Babulal @ 2015-06-02  5:18 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Kamalesh Babulal, Anshuman Khandual, Anton Blanchard, Michael Ellerman

We print the respective warning after parsing EPOW interrupts,
prompting user to take action depending upon the severity of the
event.

Some times same EPOW event warning, such as below could flood kernel
log, over a period of time. So Limit the warnings by using ratelimit
variant of pr_err. Also, merge adjacent pr_err/pr_emerg into single
one to reduce the number of lines printed per warning.

May 25 03:46:34 alp kernel: Non critical power or cooling issue cleared
May 25 03:46:52 alp kernel: Non critical power or cooling issue cleared
May 25 03:53:48 alp kernel: Non critical power or cooling issue cleared
May 25 03:55:46 alp kernel: Non critical power or cooling issue cleared
May 25 03:56:34 alp kernel: Non critical power or cooling issue cleared
May 25 03:59:04 alp kernel: Non critical power or cooling issue cleared
May 25 04:02:01 alp kernel: Non critical power or cooling issue cleared
May 25 04:04:24 alp kernel: Non critical power or cooling issue cleared
May 25 04:07:18 alp kernel: Non critical power or cooling issue cleared
May 25 04:13:04 alp kernel: Non critical power or cooling issue cleared
May 25 04:22:04 alp kernel: Non critical power or cooling issue cleared
May 25 04:22:26 alp kernel: Non critical power or cooling issue cleared
May 25 04:22:36 alp kernel: Non critical power or cooling issue cleared

Signed-off-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: Anton Blanchard <anton@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
---
v2 Changes:
 - Merged multiple adjacent pr_err/pr_emerg into single line to reduce multi-line
   warnings, based on Michael's comments.

 arch/powerpc/platforms/pseries/ras.c | 17 ++++++++---------
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c
index 02e4a17..3620935 100644
--- a/arch/powerpc/platforms/pseries/ras.c
+++ b/arch/powerpc/platforms/pseries/ras.c
@@ -145,17 +145,17 @@ static void rtas_parse_epow_errlog(struct rtas_error_log *log)
 
 	switch (action_code) {
 	case EPOW_RESET:
-		pr_err("Non critical power or cooling issue cleared");
+		pr_err_ratelimited("Non critical power or cooling issue cleared");
 		break;
 
 	case EPOW_WARN_COOLING:
-		pr_err("Non critical cooling issue reported by firmware");
-		pr_err("Check RTAS error log for details");
+		pr_err_ratelimited("Non critical cooling issue reported by firmware,"
+				   " Check RTAS error log for details");
 		break;
 
 	case EPOW_WARN_POWER:
-		pr_err("Non critical power issue reported by firmware");
-		pr_err("Check RTAS error log for details");
+		pr_err_ratelimited("Non critical power issue reported by firmware,"
+				   " Check RTAS error log for details");
 		break;
 
 	case EPOW_SYSTEM_SHUTDOWN:
@@ -169,15 +169,14 @@ static void rtas_parse_epow_errlog(struct rtas_error_log *log)
 
 	case EPOW_MAIN_ENCLOSURE:
 	case EPOW_POWER_OFF:
-		pr_emerg("Critical power/cooling issue reported by firmware");
-		pr_emerg("Check RTAS error log for details");
-		pr_emerg("Immediate power off");
+		pr_emerg("Critical power/cooling issue reported by firmware,"
+			 " Check RTAS error log for details. Immediate power off");
 		emergency_sync();
 		kernel_power_off();
 		break;
 
 	default:
-		pr_err("Unknown power/cooling event (action code %d)",
+		pr_err_ratelimited("Unknown power/cooling event (action code %d)",
 			action_code);
 	}
 }
-- 
2.1.2

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] powerpc/pseries: Ratelimit EPOW event warnings
  2015-06-02  5:18 [PATCH v2] powerpc/pseries: Ratelimit EPOW event warnings Kamalesh Babulal
@ 2015-06-24 19:18 ` Vipin K Parashar
  2015-07-14  7:51   ` Kamalesh Babulal
  0 siblings, 1 reply; 4+ messages in thread
From: Vipin K Parashar @ 2015-06-24 19:18 UTC (permalink / raw)
  To: Kamalesh Babulal, linuxppc-dev
  Cc: Anshuman Khandual, Anton Blanchard, Michael Ellerman


On 06/02/2015 10:48 AM, Kamalesh Babulal wrote:
> We print the respective warning after parsing EPOW interrupts,
> prompting user to take action depending upon the severity of the
> event.
>
> Some times same EPOW event warning, such as below could flood kernel
> log, over a period of time. So Limit the warnings by using ratelimit
> variant of pr_err. Also, merge adjacent pr_err/pr_emerg into single
> one to reduce the number of lines printed per warning.
>
> May 25 03:46:34 alp kernel: Non critical power or cooling issue cleared
> May 25 03:46:52 alp kernel: Non critical power or cooling issue cleared
> May 25 03:53:48 alp kernel: Non critical power or cooling issue cleared
> May 25 03:55:46 alp kernel: Non critical power or cooling issue cleared
> May 25 03:56:34 alp kernel: Non critical power or cooling issue cleared
> May 25 03:59:04 alp kernel: Non critical power or cooling issue cleared
> May 25 04:02:01 alp kernel: Non critical power or cooling issue cleared
> May 25 04:04:24 alp kernel: Non critical power or cooling issue cleared
> May 25 04:07:18 alp kernel: Non critical power or cooling issue cleared
> May 25 04:13:04 alp kernel: Non critical power or cooling issue cleared
> May 25 04:22:04 alp kernel: Non critical power or cooling issue cleared
> May 25 04:22:26 alp kernel: Non critical power or cooling issue cleared
> May 25 04:22:36 alp kernel: Non critical power or cooling issue cleared

These messages are minutes apart and thus rate limiting won't help.
One solution could be to use a flag based approach. Set a flag once a
EPOW condition is detected and check that flag upon receiving EPOW_RESET.
EPOW condition clear message should be logged only if a EPOW was previously
detected i.e. flag found set.

>
> Signed-off-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
> Cc: Anton Blanchard <anton@samba.org>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> ---
> v2 Changes:
>   - Merged multiple adjacent pr_err/pr_emerg into single line to reduce multi-line
>     warnings, based on Michael's comments.
>
>   arch/powerpc/platforms/pseries/ras.c | 17 ++++++++---------
>   1 file changed, 8 insertions(+), 9 deletions(-)
>
> diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c
> index 02e4a17..3620935 100644
> --- a/arch/powerpc/platforms/pseries/ras.c
> +++ b/arch/powerpc/platforms/pseries/ras.c
> @@ -145,17 +145,17 @@ static void rtas_parse_epow_errlog(struct rtas_error_log *log)
>   
>   	switch (action_code) {
>   	case EPOW_RESET:
> -		pr_err("Non critical power or cooling issue cleared");
> +		pr_err_ratelimited("Non critical power or cooling issue cleared");
>   		break;
>   
>   	case EPOW_WARN_COOLING:
> -		pr_err("Non critical cooling issue reported by firmware");
> -		pr_err("Check RTAS error log for details");
> +		pr_err_ratelimited("Non critical cooling issue reported by firmware,"
> +				   " Check RTAS error log for details");
>   		break;
>   
>   	case EPOW_WARN_POWER:
> -		pr_err("Non critical power issue reported by firmware");
> -		pr_err("Check RTAS error log for details");
> +		pr_err_ratelimited("Non critical power issue reported by firmware,"
> +				   " Check RTAS error log for details");
>   		break;
>   
>   	case EPOW_SYSTEM_SHUTDOWN:
> @@ -169,15 +169,14 @@ static void rtas_parse_epow_errlog(struct rtas_error_log *log)
>   
>   	case EPOW_MAIN_ENCLOSURE:
>   	case EPOW_POWER_OFF:
> -		pr_emerg("Critical power/cooling issue reported by firmware");
> -		pr_emerg("Check RTAS error log for details");
> -		pr_emerg("Immediate power off");
> +		pr_emerg("Critical power/cooling issue reported by firmware,"
> +			 " Check RTAS error log for details. Immediate power off");
>   		emergency_sync();
>   		kernel_power_off();
>   		break;
>   
>   	default:
> -		pr_err("Unknown power/cooling event (action code %d)",
> +		pr_err_ratelimited("Unknown power/cooling event (action code %d)",
>   			action_code);
>   	}
>   }

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] powerpc/pseries: Ratelimit EPOW event warnings
  2015-06-24 19:18 ` Vipin K Parashar
@ 2015-07-14  7:51   ` Kamalesh Babulal
  2015-07-14  8:32     ` Vipin K Parashar
  0 siblings, 1 reply; 4+ messages in thread
From: Kamalesh Babulal @ 2015-07-14  7:51 UTC (permalink / raw)
  To: Vipin K Parashar
  Cc: linuxppc-dev, Anton Blanchard, Anshuman Khandual, Michael Ellerman

* Vipin K Parashar <vipin@linux.vnet.ibm.com> [2015-06-25 00:48:20]:

> 
> On 06/02/2015 10:48 AM, Kamalesh Babulal wrote:
> >We print the respective warning after parsing EPOW interrupts,
> >prompting user to take action depending upon the severity of the
> >event.
> >
> >Some times same EPOW event warning, such as below could flood kernel
> >log, over a period of time. So Limit the warnings by using ratelimit
> >variant of pr_err. Also, merge adjacent pr_err/pr_emerg into single
> >one to reduce the number of lines printed per warning.
> >
> >May 25 03:46:34 alp kernel: Non critical power or cooling issue cleared
> >May 25 03:46:52 alp kernel: Non critical power or cooling issue cleared
> >May 25 03:53:48 alp kernel: Non critical power or cooling issue cleared
> >May 25 03:55:46 alp kernel: Non critical power or cooling issue cleared
> >May 25 03:56:34 alp kernel: Non critical power or cooling issue cleared
> >May 25 03:59:04 alp kernel: Non critical power or cooling issue cleared
> >May 25 04:02:01 alp kernel: Non critical power or cooling issue cleared
> >May 25 04:04:24 alp kernel: Non critical power or cooling issue cleared
> >May 25 04:07:18 alp kernel: Non critical power or cooling issue cleared
> >May 25 04:13:04 alp kernel: Non critical power or cooling issue cleared
> >May 25 04:22:04 alp kernel: Non critical power or cooling issue cleared
> >May 25 04:22:26 alp kernel: Non critical power or cooling issue cleared
> >May 25 04:22:36 alp kernel: Non critical power or cooling issue cleared
> 
> These messages are minutes apart and thus rate limiting won't help.
> One solution could be to use a flag based approach. Set a flag once a
> EPOW condition is detected and check that flag upon receiving EPOW_RESET.
> EPOW condition clear message should be logged only if a EPOW was previously
> detected i.e. flag found set.

Thanks for reviewing it. Sorry for late response.

bool flag epow_state, which is initialized to false and when any event gets
reported, the flag set to true once the event gets acknowledged by a reset.
As, seen in the example of flooded messages occurring only with reset event.
The reset action is guarded with bool flag (set only if there was event
reported previously) and ignore multiple resets, without real EPOW event.

I have only compile tested the patch. If this approach sounds good.
I will resend formal patch.


diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c
index 02e4a17..4819b1d 100644
--- a/arch/powerpc/platforms/pseries/ras.c
+++ b/arch/powerpc/platforms/pseries/ras.c
@@ -40,6 +40,8 @@ static int ras_check_exception_token;
 #define EPOW_SENSOR_TOKEN	9
 #define EPOW_SENSOR_INDEX	0
 
+static bool epow_state = false;
+
 static irqreturn_t ras_epow_interrupt(int irq, void *dev_id);
 static irqreturn_t ras_error_interrupt(int irq, void *dev_id);
 
@@ -145,21 +147,27 @@ static void rtas_parse_epow_errlog(struct rtas_error_log *log)
 
 	switch (action_code) {
 	case EPOW_RESET:
-		pr_err("Non critical power or cooling issue cleared");
+		if (epow_state) {
+			pr_err("Non critical power or cooling issue cleared");
+			epow_state = false;
+		}
 		break;
 
 	case EPOW_WARN_COOLING:
-		pr_err("Non critical cooling issue reported by firmware");
-		pr_err("Check RTAS error log for details");
+		pr_err("Non critical cooling issue reported by firmware, "
+		       "Check RTAS error log for details");
+		epow_state = true;
 		break;
 
 	case EPOW_WARN_POWER:
-		pr_err("Non critical power issue reported by firmware");
-		pr_err("Check RTAS error log for details");
+		pr_err("Non critical power issue reported by firmware, "
+		       "Check RTAS error log for details");
+		epow_state = true;
 		break;
 
 	case EPOW_SYSTEM_SHUTDOWN:
 		handle_system_shutdown(epow_log->event_modifier);
+		epow_state = true;
 		break;
 
 	case EPOW_SYSTEM_HALT:
@@ -169,9 +177,8 @@ static void rtas_parse_epow_errlog(struct rtas_error_log *log)
 
 	case EPOW_MAIN_ENCLOSURE:
 	case EPOW_POWER_OFF:
-		pr_emerg("Critical power/cooling issue reported by firmware");
-		pr_emerg("Check RTAS error log for details");
-		pr_emerg("Immediate power off");
+		pr_emerg("Critical power/cooling issue reported by firmware, "
+			 "Check RTAS error log for details. Immediate power off.");
 		emergency_sync();
 		kernel_power_off();
 		break;
@@ -179,6 +186,7 @@ static void rtas_parse_epow_errlog(struct rtas_error_log *log)
 	default:
 		pr_err("Unknown power/cooling event (action code %d)",
 			action_code);
+		epow_state = true;
 	}
 }
 

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] powerpc/pseries: Ratelimit EPOW event warnings
  2015-07-14  7:51   ` Kamalesh Babulal
@ 2015-07-14  8:32     ` Vipin K Parashar
  0 siblings, 0 replies; 4+ messages in thread
From: Vipin K Parashar @ 2015-07-14  8:32 UTC (permalink / raw)
  To: Kamalesh Babulal
  Cc: linuxppc-dev, Anton Blanchard, Anshuman Khandual, Michael Ellerman

Patch looks good to me. A small nit pick below.

On 07/14/2015 01:21 PM, Kamalesh Babulal wrote:
> * Vipin K Parashar <vipin@linux.vnet.ibm.com> [2015-06-25 00:48:20]:
>
>> On 06/02/2015 10:48 AM, Kamalesh Babulal wrote:
>>> We print the respective warning after parsing EPOW interrupts,
>>> prompting user to take action depending upon the severity of the
>>> event.
>>>
>>> Some times same EPOW event warning, such as below could flood kernel
>>> log, over a period of time. So Limit the warnings by using ratelimit
>>> variant of pr_err. Also, merge adjacent pr_err/pr_emerg into single
>>> one to reduce the number of lines printed per warning.
>>>
>>> May 25 03:46:34 alp kernel: Non critical power or cooling issue cleared
>>> May 25 03:46:52 alp kernel: Non critical power or cooling issue cleared
>>> May 25 03:53:48 alp kernel: Non critical power or cooling issue cleared
>>> May 25 03:55:46 alp kernel: Non critical power or cooling issue cleared
>>> May 25 03:56:34 alp kernel: Non critical power or cooling issue cleared
>>> May 25 03:59:04 alp kernel: Non critical power or cooling issue cleared
>>> May 25 04:02:01 alp kernel: Non critical power or cooling issue cleared
>>> May 25 04:04:24 alp kernel: Non critical power or cooling issue cleared
>>> May 25 04:07:18 alp kernel: Non critical power or cooling issue cleared
>>> May 25 04:13:04 alp kernel: Non critical power or cooling issue cleared
>>> May 25 04:22:04 alp kernel: Non critical power or cooling issue cleared
>>> May 25 04:22:26 alp kernel: Non critical power or cooling issue cleared
>>> May 25 04:22:36 alp kernel: Non critical power or cooling issue cleared
>> These messages are minutes apart and thus rate limiting won't help.
>> One solution could be to use a flag based approach. Set a flag once a
>> EPOW condition is detected and check that flag upon receiving EPOW_RESET.
>> EPOW condition clear message should be logged only if a EPOW was previously
>> detected i.e. flag found set.
> Thanks for reviewing it. Sorry for late response.
>
> bool flag epow_state, which is initialized to false and when any event gets
> reported, the flag set to true once the event gets acknowledged by a reset.
> As, seen in the example of flooded messages occurring only with reset event.
> The reset action is guarded with bool flag (set only if there was event
> reported previously) and ignore multiple resets, without real EPOW event.
>
> I have only compile tested the patch. If this approach sounds good.
> I will resend formal patch.
>
>
> diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c
> index 02e4a17..4819b1d 100644
> --- a/arch/powerpc/platforms/pseries/ras.c
> +++ b/arch/powerpc/platforms/pseries/ras.c
> @@ -40,6 +40,8 @@ static int ras_check_exception_token;
>   #define EPOW_SENSOR_TOKEN	9
>   #define EPOW_SENSOR_INDEX	0
>
> +static bool epow_state = false;
> +

Explicit declaration isn't needed. default value would be false already.
A one line comment about flag usage would be good.

>   static irqreturn_t ras_epow_interrupt(int irq, void *dev_id);
>   static irqreturn_t ras_error_interrupt(int irq, void *dev_id);
>
> @@ -145,21 +147,27 @@ static void rtas_parse_epow_errlog(struct rtas_error_log *log)
>
>   	switch (action_code) {
>   	case EPOW_RESET:
> -		pr_err("Non critical power or cooling issue cleared");
> +		if (epow_state) {
> +			pr_err("Non critical power or cooling issue cleared");
> +			epow_state = false;
> +		}
>   		break;
>
>   	case EPOW_WARN_COOLING:
> -		pr_err("Non critical cooling issue reported by firmware");
> -		pr_err("Check RTAS error log for details");
> +		pr_err("Non critical cooling issue reported by firmware, "
> +		       "Check RTAS error log for details");
> +		epow_state = true;
>   		break;
>
>   	case EPOW_WARN_POWER:
> -		pr_err("Non critical power issue reported by firmware");
> -		pr_err("Check RTAS error log for details");
> +		pr_err("Non critical power issue reported by firmware, "
> +		       "Check RTAS error log for details");
> +		epow_state = true;
>   		break;
>
>   	case EPOW_SYSTEM_SHUTDOWN:
>   		handle_system_shutdown(epow_log->event_modifier);
> +		epow_state = true;
>   		break;
>
>   	case EPOW_SYSTEM_HALT:
> @@ -169,9 +177,8 @@ static void rtas_parse_epow_errlog(struct rtas_error_log *log)
>
>   	case EPOW_MAIN_ENCLOSURE:
>   	case EPOW_POWER_OFF:
> -		pr_emerg("Critical power/cooling issue reported by firmware");
> -		pr_emerg("Check RTAS error log for details");
> -		pr_emerg("Immediate power off");
> +		pr_emerg("Critical power/cooling issue reported by firmware, "
> +			 "Check RTAS error log for details. Immediate power off.");
>   		emergency_sync();
>   		kernel_power_off();
>   		break;
> @@ -179,6 +186,7 @@ static void rtas_parse_epow_errlog(struct rtas_error_log *log)
>   	default:
>   		pr_err("Unknown power/cooling event (action code %d)",
>   			action_code);
> +		epow_state = true;
>   	}
>   }
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2015-07-14  8:32 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-02  5:18 [PATCH v2] powerpc/pseries: Ratelimit EPOW event warnings Kamalesh Babulal
2015-06-24 19:18 ` Vipin K Parashar
2015-07-14  7:51   ` Kamalesh Babulal
2015-07-14  8:32     ` Vipin K Parashar

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.