linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/3] ACPI: EC: Fix an EC event IRQ storming issue
@ 2017-06-14  5:59 Lv Zheng
  2017-06-14  5:59 ` [PATCH 2/3] ACPI: EC: Fix EC command visibility for dynamic debug Lv Zheng
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Lv Zheng @ 2017-06-14  5:59 UTC (permalink / raw)
  To: Rafael J . Wysocki, Rafael J . Wysocki, Len Brown
  Cc: Lv Zheng, Lv Zheng, linux-kernel, linux-acpi

The EC event IRQ (SCI_EVT) can only be handled by submitting QR_EC. As the
EC driver handles SCI_EVT in a workqueue, after SCI_EVT is flagged and
before QR_EC is submitted, there is a period risking IRQ storming. EC IRQ
must be masked for this period but linux EC driver never does so.

No end user notices the IRQ storming and no developer fixes this known
issue because:
1. the EC IRQ is always edge triggered GPE, and
2. the kernel can execute no-op EC IRQ handler very fast.
For edge triggered EC GPE platforms, it is only reported of post-resume EC
event lost issues, there won't be an IRQ storming. For level triggered EC
GPE platforms, fortunately the kernel is always fast enough to execute such
a no-op EC IRQ handler so that the IRQ handler won't be accumulated to
starve the task contexts, causing a real IRQ storming.

But the IRQ storming actually can still happen when:
1. the EC IRQ performs like level triggered GPE, and
2. the kernel EC debugging log is turned on but the console is slow enough.
There are more and more platforms using EC GPE as wake GPE where the EC GPE
is likely designed as level triggered. Then when EC debugging log is
enabled, the EC IRQ handler is no longer a no-op but dumps IRQ status to
the consoles. If the consoles are slow enough, the EC IRQs can arrive much
faster than executing the handler. Finally the accumulated EC event IRQ
handlers starve the task contexts, causing the IRQ storming to occur, and
the kernel hangs can be observed during boot/resume.

See link #1 for reference, however the bug link can only be accessed by
priviledged Intel users.

This patch fixes this issue by masking EC IRQ for this period:
1. begins when there is an SCI_EVT IRQ pending, and
2. ends when there is a QR_EC completed (SCI_EVT acknowledged).

Link: https://jira01.devtools.intel.com/browse/LCK-4004 [#1]
Tested-by: Wang Wendy <wendy.wang@intel.com>
Tested-by: Feng Chenzhou <chenzhoux.feng@intel.com>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
---
 drivers/acpi/ec.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/acpi/ec.c b/drivers/acpi/ec.c
index c24235d..30d7f82 100644
--- a/drivers/acpi/ec.c
+++ b/drivers/acpi/ec.c
@@ -459,8 +459,10 @@ static bool acpi_ec_submit_flushable_request(struct acpi_ec *ec)
 
 static void acpi_ec_submit_query(struct acpi_ec *ec)
 {
-	if (acpi_ec_event_enabled(ec) &&
-	    !test_and_set_bit(EC_FLAGS_QUERY_PENDING, &ec->flags)) {
+	acpi_ec_set_storm(ec, EC_FLAGS_COMMAND_STORM);
+	if (!acpi_ec_event_enabled(ec))
+		return;
+	if (!test_and_set_bit(EC_FLAGS_QUERY_PENDING, &ec->flags)) {
 		ec_dbg_evt("Command(%s) submitted/blocked",
 			   acpi_ec_cmd_string(ACPI_EC_COMMAND_QUERY));
 		ec->nr_pending_queries++;
@@ -470,11 +472,10 @@ static void acpi_ec_submit_query(struct acpi_ec *ec)
 
 static void acpi_ec_complete_query(struct acpi_ec *ec)
 {
-	if (test_bit(EC_FLAGS_QUERY_PENDING, &ec->flags)) {
-		clear_bit(EC_FLAGS_QUERY_PENDING, &ec->flags);
+	if (test_and_clear_bit(EC_FLAGS_QUERY_PENDING, &ec->flags))
 		ec_dbg_evt("Command(%s) unblocked",
 			   acpi_ec_cmd_string(ACPI_EC_COMMAND_QUERY));
-	}
+	acpi_ec_clear_storm(ec, EC_FLAGS_COMMAND_STORM);
 }
 
 static inline void __acpi_ec_enable_event(struct acpi_ec *ec)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/3] ACPI: EC: Fix EC command visibility for dynamic debug
  2017-06-14  5:59 [PATCH 1/3] ACPI: EC: Fix an EC event IRQ storming issue Lv Zheng
@ 2017-06-14  5:59 ` Lv Zheng
  2017-06-14  5:59 ` [PATCH 3/3] ACPI: EC: Change EC noirq tuning to be an optional behavior Lv Zheng
  2017-06-28 21:23 ` [PATCH 1/3] ACPI: EC: Fix an EC event IRQ storming issue Rafael J. Wysocki
  2 siblings, 0 replies; 7+ messages in thread
From: Lv Zheng @ 2017-06-14  5:59 UTC (permalink / raw)
  To: Rafael J . Wysocki, Rafael J . Wysocki, Len Brown
  Cc: Lv Zheng, Lv Zheng, linux-kernel, linux-acpi

acpi_ec_cmd_string() currently is only enabled for "DEBUG" macro, but users
trend to use CONFIG_DYNAMIC_DEBUG and enable ec.c pr_debug() print-outs by
"dyndbg='file ec.c +p'". In this use case, all command names are turned
into UNDEF and the log is confusing. This affects bugzilla triage work.

This patch fixes this issue by enabling acpi_ec_cmd_string() for
CONFIG_DYNAMIC_DEBUG.

Tested-by: Wang Wendy <wendy.wang@intel.com>
Tested-by: Feng Chenzhou <chenzhoux.feng@intel.com>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
---
 drivers/acpi/ec.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/acpi/ec.c b/drivers/acpi/ec.c
index 30d7f82..f3ff591 100644
--- a/drivers/acpi/ec.c
+++ b/drivers/acpi/ec.c
@@ -316,7 +316,7 @@ static inline void acpi_ec_write_data(struct acpi_ec *ec, u8 data)
 	ec->timestamp = jiffies;
 }
 
-#ifdef DEBUG
+#if defined(DEBUG) || defined(CONFIG_DYNAMIC_DEBUG)
 static const char *acpi_ec_cmd_string(u8 cmd)
 {
 	switch (cmd) {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 3/3] ACPI: EC: Change EC noirq tuning to be an optional behavior
  2017-06-14  5:59 [PATCH 1/3] ACPI: EC: Fix an EC event IRQ storming issue Lv Zheng
  2017-06-14  5:59 ` [PATCH 2/3] ACPI: EC: Fix EC command visibility for dynamic debug Lv Zheng
@ 2017-06-14  5:59 ` Lv Zheng
  2017-06-28 21:31   ` Rafael J. Wysocki
  2017-06-28 21:23 ` [PATCH 1/3] ACPI: EC: Fix an EC event IRQ storming issue Rafael J. Wysocki
  2 siblings, 1 reply; 7+ messages in thread
From: Lv Zheng @ 2017-06-14  5:59 UTC (permalink / raw)
  To: Rafael J . Wysocki, Rafael J . Wysocki, Len Brown
  Cc: Lv Zheng, Lv Zheng, linux-kernel, linux-acpi

According to the bug report, though the busy polling mode can make noirq
stages executed faster, it causes abnormal fan blowing in noirq stages.

This patch prepares an option so that the automatic busy polling mode
switching for noirq stages can be enabled by who wants to tune it, not all
users.
Noticed that the new global option cannot be changed during noirq stages.
There is no need to lock its value changes to sync with polling mode
settings switches.

For reporters and testers in the thread, as there are too many reporters
on the bug link, this patch only picks names from most active commenters.
Sorry for the neglet.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=191181
Reported-by: Tatsuyuki Ishi <ishitatsuyuki@gmail.com>
Reported-by: Claudio Sacerdoti Coen <claudio.sacerdoticoen@unibo.it>
Tested-by: Nicolo' <nicolopiazzalunga@gmail.com>
Reported-by: Jens Axboe <axboe@kernel.dk>
Tested-by: Gjorgji Jankovski <j.gjorgji@gmail.com>
Tested-by: Damjan Georgievski <gdamjan@gmail.com>
Tested-by: Fernando Chaves <nanochaves@gmail.com>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
---
 drivers/acpi/ec.c | 20 ++++++++++++++------
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/drivers/acpi/ec.c b/drivers/acpi/ec.c
index f3ff591..de5dde6 100644
--- a/drivers/acpi/ec.c
+++ b/drivers/acpi/ec.c
@@ -151,6 +151,10 @@ static bool ec_freeze_events __read_mostly = true;
 module_param(ec_freeze_events, bool, 0644);
 MODULE_PARM_DESC(ec_freeze_events, "Disabling event handling during suspend/resume");
 
+static bool ec_tune_noirq __read_mostly = false;
+module_param(ec_tune_noirq, bool, 0644);
+MODULE_PARM_DESC(ec_tune_noirq, "Automatic enabling busy polling to tune noirq stages faster");
+
 struct acpi_ec_query_handler {
 	struct list_head node;
 	acpi_ec_query_func func;
@@ -979,9 +983,11 @@ static void acpi_ec_enter_noirq(struct acpi_ec *ec)
 	unsigned long flags;
 
 	spin_lock_irqsave(&ec->lock, flags);
-	ec->busy_polling = true;
-	ec->polling_guard = 0;
-	ec_log_drv("interrupt blocked");
+	if (ec_tune_noirq) {
+		ec->busy_polling = true;
+		ec->polling_guard = 0;
+		ec_log_drv("interrupt blocked");
+	}
 	spin_unlock_irqrestore(&ec->lock, flags);
 }
 
@@ -990,9 +996,11 @@ static void acpi_ec_leave_noirq(struct acpi_ec *ec)
 	unsigned long flags;
 
 	spin_lock_irqsave(&ec->lock, flags);
-	ec->busy_polling = ec_busy_polling;
-	ec->polling_guard = ec_polling_guard;
-	ec_log_drv("interrupt unblocked");
+	if (ec_tune_noirq) {
+		ec->busy_polling = ec_busy_polling;
+		ec->polling_guard = ec_polling_guard;
+		ec_log_drv("interrupt unblocked");
+	}
 	spin_unlock_irqrestore(&ec->lock, flags);
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/3] ACPI: EC: Fix an EC event IRQ storming issue
  2017-06-14  5:59 [PATCH 1/3] ACPI: EC: Fix an EC event IRQ storming issue Lv Zheng
  2017-06-14  5:59 ` [PATCH 2/3] ACPI: EC: Fix EC command visibility for dynamic debug Lv Zheng
  2017-06-14  5:59 ` [PATCH 3/3] ACPI: EC: Change EC noirq tuning to be an optional behavior Lv Zheng
@ 2017-06-28 21:23 ` Rafael J. Wysocki
  2 siblings, 0 replies; 7+ messages in thread
From: Rafael J. Wysocki @ 2017-06-28 21:23 UTC (permalink / raw)
  To: Lv Zheng
  Cc: Rafael J . Wysocki, Len Brown, Lv Zheng, linux-kernel, linux-acpi

On Wednesday, June 14, 2017 01:59:09 PM Lv Zheng wrote:
> The EC event IRQ (SCI_EVT) can only be handled by submitting QR_EC. As the
> EC driver handles SCI_EVT in a workqueue, after SCI_EVT is flagged and
> before QR_EC is submitted, there is a period risking IRQ storming. EC IRQ
> must be masked for this period but linux EC driver never does so.
> 
> No end user notices the IRQ storming and no developer fixes this known
> issue because:
> 1. the EC IRQ is always edge triggered GPE, and
> 2. the kernel can execute no-op EC IRQ handler very fast.
> For edge triggered EC GPE platforms, it is only reported of post-resume EC
> event lost issues, there won't be an IRQ storming. For level triggered EC
> GPE platforms, fortunately the kernel is always fast enough to execute such
> a no-op EC IRQ handler so that the IRQ handler won't be accumulated to
> starve the task contexts, causing a real IRQ storming.
> 
> But the IRQ storming actually can still happen when:
> 1. the EC IRQ performs like level triggered GPE, and
> 2. the kernel EC debugging log is turned on but the console is slow enough.
> There are more and more platforms using EC GPE as wake GPE where the EC GPE
> is likely designed as level triggered. Then when EC debugging log is
> enabled, the EC IRQ handler is no longer a no-op but dumps IRQ status to
> the consoles. If the consoles are slow enough, the EC IRQs can arrive much
> faster than executing the handler. Finally the accumulated EC event IRQ
> handlers starve the task contexts, causing the IRQ storming to occur, and
> the kernel hangs can be observed during boot/resume.
> 
> See link #1 for reference, however the bug link can only be accessed by
> priviledged Intel users.
> 
> This patch fixes this issue by masking EC IRQ for this period:
> 1. begins when there is an SCI_EVT IRQ pending, and
> 2. ends when there is a QR_EC completed (SCI_EVT acknowledged).
> 
> Link: https://jira01.devtools.intel.com/browse/LCK-4004 [#1]
> Tested-by: Wang Wendy <wendy.wang@intel.com>
> Tested-by: Feng Chenzhou <chenzhoux.feng@intel.com>
> Signed-off-by: Lv Zheng <lv.zheng@intel.com>

I've applied this and the [2/3], but I'm not sure about the [3/3].

I'll reply to that patch.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 3/3] ACPI: EC: Change EC noirq tuning to be an optional behavior
  2017-06-14  5:59 ` [PATCH 3/3] ACPI: EC: Change EC noirq tuning to be an optional behavior Lv Zheng
@ 2017-06-28 21:31   ` Rafael J. Wysocki
  2017-07-03  5:15     ` Zheng, Lv
  0 siblings, 1 reply; 7+ messages in thread
From: Rafael J. Wysocki @ 2017-06-28 21:31 UTC (permalink / raw)
  To: Lv Zheng
  Cc: Rafael J . Wysocki, Len Brown, Lv Zheng, linux-kernel, linux-acpi

On Wednesday, June 14, 2017 01:59:24 PM Lv Zheng wrote:
> According to the bug report, though the busy polling mode can make noirq
> stages executed faster, it causes abnormal fan blowing in noirq stages.
> 
> This patch prepares an option so that the automatic busy polling mode
> switching for noirq stages can be enabled by who wants to tune it, not all
> users.
> Noticed that the new global option cannot be changed during noirq stages.
> There is no need to lock its value changes to sync with polling mode
> settings switches.
> 
> For reporters and testers in the thread, as there are too many reporters
> on the bug link, this patch only picks names from most active commenters.
> Sorry for the neglet.
> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=191181
> Reported-by: Tatsuyuki Ishi <ishitatsuyuki@gmail.com>
> Reported-by: Claudio Sacerdoti Coen <claudio.sacerdoticoen@unibo.it>
> Tested-by: Nicolo' <nicolopiazzalunga@gmail.com>
> Reported-by: Jens Axboe <axboe@kernel.dk>
> Tested-by: Gjorgji Jankovski <j.gjorgji@gmail.com>
> Tested-by: Damjan Georgievski <gdamjan@gmail.com>
> Tested-by: Fernando Chaves <nanochaves@gmail.com>
> Signed-off-by: Lv Zheng <lv.zheng@intel.com>

First of all, this seems to be a fix for commit c3a696b6e8f8 (ACPI / EC: Use busy polling
mode when GPE is not enabled), so there should be a Fixes: tag pointing to that
one.

Moreover, if that is just a performance optimization and not a matter of correctness,
why don't we simply drop acpi_ec_enter/leave_noirq() entirely?

What is going to break if we do that?

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [PATCH 3/3] ACPI: EC: Change EC noirq tuning to be an optional behavior
  2017-06-28 21:31   ` Rafael J. Wysocki
@ 2017-07-03  5:15     ` Zheng, Lv
  2017-07-06  8:30       ` Chen Yu
  0 siblings, 1 reply; 7+ messages in thread
From: Zheng, Lv @ 2017-07-03  5:15 UTC (permalink / raw)
  To: Rafael J. Wysocki, Chen, Yu C
  Cc: Wysocki, Rafael J, Brown, Len, Lv Zheng, linux-kernel, linux-acpi

Hi, Rafael

> From: Rafael J. Wysocki [mailto:rjw@rjwysocki.net]
> Subject: Re: [PATCH 3/3] ACPI: EC: Change EC noirq tuning to be an optional behavior
> 
> On Wednesday, June 14, 2017 01:59:24 PM Lv Zheng wrote:
> > According to the bug report, though the busy polling mode can make noirq
> > stages executed faster, it causes abnormal fan blowing in noirq stages.
> >
> > This patch prepares an option so that the automatic busy polling mode
> > switching for noirq stages can be enabled by who wants to tune it, not all
> > users.
> > Noticed that the new global option cannot be changed during noirq stages.
> > There is no need to lock its value changes to sync with polling mode
> > settings switches.
> >
> > For reporters and testers in the thread, as there are too many reporters
> > on the bug link, this patch only picks names from most active commenters.
> > Sorry for the neglet.
> >
> > Link: https://bugzilla.kernel.org/show_bug.cgi?id=191181
> > Reported-by: Tatsuyuki Ishi <ishitatsuyuki@gmail.com>
> > Reported-by: Claudio Sacerdoti Coen <claudio.sacerdoticoen@unibo.it>
> > Tested-by: Nicolo' <nicolopiazzalunga@gmail.com>
> > Reported-by: Jens Axboe <axboe@kernel.dk>
> > Tested-by: Gjorgji Jankovski <j.gjorgji@gmail.com>
> > Tested-by: Damjan Georgievski <gdamjan@gmail.com>
> > Tested-by: Fernando Chaves <nanochaves@gmail.com>
> > Signed-off-by: Lv Zheng <lv.zheng@intel.com>
> 
> First of all, this seems to be a fix for commit c3a696b6e8f8 (ACPI / EC: Use busy polling
> mode when GPE is not enabled), so there should be a Fixes: tag pointing to that
> one.
> 
> Moreover, if that is just a performance optimization and not a matter of correctness,
> why don't we simply drop acpi_ec_enter/leave_noirq() entirely?
> 
> What is going to break if we do that?

Let me Cc Yu for justification.
I just added busy poll support for suspend/boot according to the root cause reported by him.
He should know the end user requirements better than me.

Thanks and best regards
Lv

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 3/3] ACPI: EC: Change EC noirq tuning to be an optional behavior
  2017-07-03  5:15     ` Zheng, Lv
@ 2017-07-06  8:30       ` Chen Yu
  0 siblings, 0 replies; 7+ messages in thread
From: Chen Yu @ 2017-07-06  8:30 UTC (permalink / raw)
  To: Zheng, Lv
  Cc: Rafael J. Wysocki, Wysocki, Rafael J, Brown, Len, Lv Zheng,
	linux-kernel, linux-acpi

Hi Lv,
On Mon, Jul 03, 2017 at 01:15:26PM +0800, Zheng, Lv wrote:
> Hi, Rafael
> 
> > From: Rafael J. Wysocki [mailto:rjw@rjwysocki.net]
> > Subject: Re: [PATCH 3/3] ACPI: EC: Change EC noirq tuning to be an optional behavior
> > 
> > On Wednesday, June 14, 2017 01:59:24 PM Lv Zheng wrote:
> > > According to the bug report, though the busy polling mode can make noirq
> > > stages executed faster, it causes abnormal fan blowing in noirq stages.
> > >
> > > This patch prepares an option so that the automatic busy polling mode
> > > switching for noirq stages can be enabled by who wants to tune it, not all
> > > users.
> > > Noticed that the new global option cannot be changed during noirq stages.
> > > There is no need to lock its value changes to sync with polling mode
> > > settings switches.
> > >
> > > For reporters and testers in the thread, as there are too many reporters
> > > on the bug link, this patch only picks names from most active commenters.
> > > Sorry for the neglet.
> > >
> > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=191181
> > > Reported-by: Tatsuyuki Ishi <ishitatsuyuki@gmail.com>
> > > Reported-by: Claudio Sacerdoti Coen <claudio.sacerdoticoen@unibo.it>
> > > Tested-by: Nicolo' <nicolopiazzalunga@gmail.com>
> > > Reported-by: Jens Axboe <axboe@kernel.dk>
> > > Tested-by: Gjorgji Jankovski <j.gjorgji@gmail.com>
> > > Tested-by: Damjan Georgievski <gdamjan@gmail.com>
> > > Tested-by: Fernando Chaves <nanochaves@gmail.com>
> > > Signed-off-by: Lv Zheng <lv.zheng@intel.com>
> > 
> > First of all, this seems to be a fix for commit c3a696b6e8f8 (ACPI / EC: Use busy polling
> > mode when GPE is not enabled), so there should be a Fixes: tag pointing to that
> > one.
> > 
> > Moreover, if that is just a performance optimization and not a matter of correctness,
> > why don't we simply drop acpi_ec_enter/leave_noirq() entirely?
> > 
> > What is going to break if we do that?
> 
> Let me Cc Yu for justification.
> I just added busy poll support for suspend/boot according to the root cause reported by him.
> He should know the end user requirements better than me.
> 
I remembered the original issue was reported against the slowness during suspend/resume
due to the ec is running with GPE disabled, and falls into schedule_timeout() loop.

If the busy polling mode is controlled by the boot options rather than
acpi_ec_enter/leave_noirq(), then user should be noticed of the result
that the cpu usage might be always higer if they enable the busy polling
mode.

Thanks,
	Yu
> Thanks and best regards
> Lv

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-07-06  8:30 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-14  5:59 [PATCH 1/3] ACPI: EC: Fix an EC event IRQ storming issue Lv Zheng
2017-06-14  5:59 ` [PATCH 2/3] ACPI: EC: Fix EC command visibility for dynamic debug Lv Zheng
2017-06-14  5:59 ` [PATCH 3/3] ACPI: EC: Change EC noirq tuning to be an optional behavior Lv Zheng
2017-06-28 21:31   ` Rafael J. Wysocki
2017-07-03  5:15     ` Zheng, Lv
2017-07-06  8:30       ` Chen Yu
2017-06-28 21:23 ` [PATCH 1/3] ACPI: EC: Fix an EC event IRQ storming issue Rafael J. Wysocki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).