linux-pm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] ACPI: PM: s2idle: Fix possible suspend lock-up
@ 2019-11-28 22:42 Rafael J. Wysocki
  2019-11-28 22:47 ` [PATCH 1/2] ACPI: EC: Rework flushing of pending work Rafael J. Wysocki
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Rafael J. Wysocki @ 2019-11-28 22:42 UTC (permalink / raw)
  To: Linux ACPI, Kenneth R. Crudup; +Cc: Linux PM, LKML

Hi All,

The first patch in this series is a fix for a suspend-to-idle issues introduced
in 5.4 (see its changelog for details).

The second one is more of an optimization, although some systems may need it
too (depending on how fragile their platform firmware is).

Kenneth,

This series is roughly equivalent to the patch at

https://lore.kernel.org/linux-pm/CAJZ5v0h1Ro75++4xuCznkx6GNYd+G5NpMGP96z1jdh=dm9uZbw@mail.gmail.com/T/#m11ca9a14efe4e5193bbda69767595a5fb7bd5479

The main difference is that it flushes system_wq before ec_query_wq, but that
should not really matter if the issue you saw in 5.4 is the one described
in the changelog of patch [1/2].

Please test this series and let me know if it works for you too.

In case it does work, it would be good to know if patch [1/2] is sufficient to
fix the suspend problem at least with ec_no_wakeup == 0 (it may not be
sufficient with ec_no_wakeup == 1).

Thanks!




^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/2] ACPI: EC: Rework flushing of pending work
  2019-11-28 22:42 [PATCH 0/2] ACPI: PM: s2idle: Fix possible suspend lock-up Rafael J. Wysocki
@ 2019-11-28 22:47 ` Rafael J. Wysocki
  2019-11-28 22:50 ` [PATCH 2/2] ACPI: PM: s2idle: Rework ACPI events synchronization Rafael J. Wysocki
  2019-11-29 20:35 ` [PATCH 0/2] ACPI: PM: s2idle: Fix possible suspend lock-up Kenneth R. Crudup
  2 siblings, 0 replies; 5+ messages in thread
From: Rafael J. Wysocki @ 2019-11-28 22:47 UTC (permalink / raw)
  To: Linux ACPI; +Cc: Kenneth R. Crudup, Linux PM, LKML

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

There is a race condition in the ACPI EC driver, between
__acpi_ec_flush_event() and acpi_ec_event_handler(), that may
cause systems to stay in suspended-to-idle forever after a wakeup
event coming from the EC.

Namely, acpi_s2idle_wake() calls acpi_ec_flush_work() to wait until
the delayed work resulting from the handling of the EC GPE in
acpi_ec_dispatch_gpe() is processed, and that function invokes
__acpi_ec_flush_event() which uses wait_event() to wait for
ec->nr_pending_queries to become zero on ec->wait, and that wait
queue may be woken up too early.

Suppose that acpi_ec_dispatch_gpe() has caused acpi_ec_gpe_handler()
to run, so advance_transaction() has been called and it has invoked
acpi_ec_submit_query() to queue up an event work item, so
ec->nr_pending_queries has been incremented (under ec->lock).  The
work function of that work item, acpi_ec_event_handler() runs later
and calls acpi_ec_query() to process the event.  That function calls
acpi_ec_transaction() which invokes acpi_ec_transaction_unlocked()
and the latter wakes up ec->wait under ec->lock, but it drops that
lock before returning.

When acpi_ec_query() returns, acpi_ec_event_handler() acquires
ec->lock and decrements ec->nr_pending_queries, but at that point
__acpi_ec_flush_event() (woken up previously) may already have
acquired ec->lock, checked the value of ec->nr_pending_queries (and
it would not have been zero then) and decided to go back to sleep.
Next, if ec->nr_pending_queries is equal to zero now, the loop
in acpi_ec_event_handler() terminates, ec->lock is released and
acpi_ec_check_event() is called, but it does nothing unless
ec_event_clearing is equal to ACPI_EC_EVT_TIMING_EVENT (which is
not the case by default).  In the end, if no more event work items
have been queued up while executing acpi_ec_transaction_unlocked(),
there is nothing to wake up __acpi_ec_flush_event() again and it
sleeps forever, so the suspend-to-idle loop cannot make progress and
the system is permanently suspended.

To avoid this issue, notice that it actually is not necessary to
wait for ec->nr_pending_queries to become zero in every case in
which __acpi_ec_flush_event() is used.

First, during platform-based system suspend (not suspend-to-idle),
__acpi_ec_flush_event() is called by acpi_ec_disable_event() after
clearing the EC_FLAGS_QUERY_ENABLED flag, which prevents
acpi_ec_submit_query() from submitting any new event work items,
so calling flush_scheduled_work() and flushing ec_query_wq
subsequently (in order to wait until all of the queries in that
queue have been processed) would be sufficient to flush all of
the pending EC work in that case.

Second, the purpose of the flushing of pending EC work while
suspended-to-idle described above really is to wait until the
first event work item coming from acpi_ec_dispatch_gpe() is
complete, because it should produce system wakeup events if
that is a valid EC-based system wakeup, so calling
flush_scheduled_work() followed by flushing ec_query_wq is also
sufficient for that purpose.

Rework the code to follow the above observations.

Fixes: 56b9918490 ("PM: sleep: Simplify suspend-to-idle control flow")
Reported-by: Kenneth R. Crudup <kenny@panix.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/acpi/ec.c |   36 +++++++++++++-----------------------
 1 file changed, 13 insertions(+), 23 deletions(-)

Index: linux-pm/drivers/acpi/ec.c
===================================================================
--- linux-pm.orig/drivers/acpi/ec.c
+++ linux-pm/drivers/acpi/ec.c
@@ -525,26 +525,10 @@ static void acpi_ec_enable_event(struct
 }
 
 #ifdef CONFIG_PM_SLEEP
-static bool acpi_ec_query_flushed(struct acpi_ec *ec)
+static void __acpi_ec_flush_work(void)
 {
-	bool flushed;
-	unsigned long flags;
-
-	spin_lock_irqsave(&ec->lock, flags);
-	flushed = !ec->nr_pending_queries;
-	spin_unlock_irqrestore(&ec->lock, flags);
-	return flushed;
-}
-
-static void __acpi_ec_flush_event(struct acpi_ec *ec)
-{
-	/*
-	 * When ec_freeze_events is true, we need to flush events in
-	 * the proper position before entering the noirq stage.
-	 */
-	wait_event(ec->wait, acpi_ec_query_flushed(ec));
-	if (ec_query_wq)
-		flush_workqueue(ec_query_wq);
+	flush_scheduled_work(); /* flush ec->work */
+	flush_workqueue(ec_query_wq); /* flush queries */
 }
 
 static void acpi_ec_disable_event(struct acpi_ec *ec)
@@ -554,15 +538,21 @@ static void acpi_ec_disable_event(struct
 	spin_lock_irqsave(&ec->lock, flags);
 	__acpi_ec_disable_event(ec);
 	spin_unlock_irqrestore(&ec->lock, flags);
-	__acpi_ec_flush_event(ec);
+
+	/*
+	 * When ec_freeze_events is true, we need to flush events in
+	 * the proper position before entering the noirq stage.
+	 */
+	__acpi_ec_flush_work();
 }
 
 void acpi_ec_flush_work(void)
 {
-	if (first_ec)
-		__acpi_ec_flush_event(first_ec);
+	/* Without ec_query_wq there is nothing to flush. */
+	if (!ec_query_wq)
+		return;
 
-	flush_scheduled_work();
+	__acpi_ec_flush_work();
 }
 #endif /* CONFIG_PM_SLEEP */
 




^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 2/2] ACPI: PM: s2idle: Rework ACPI events synchronization
  2019-11-28 22:42 [PATCH 0/2] ACPI: PM: s2idle: Fix possible suspend lock-up Rafael J. Wysocki
  2019-11-28 22:47 ` [PATCH 1/2] ACPI: EC: Rework flushing of pending work Rafael J. Wysocki
@ 2019-11-28 22:50 ` Rafael J. Wysocki
  2019-11-29 20:35 ` [PATCH 0/2] ACPI: PM: s2idle: Fix possible suspend lock-up Kenneth R. Crudup
  2 siblings, 0 replies; 5+ messages in thread
From: Rafael J. Wysocki @ 2019-11-28 22:50 UTC (permalink / raw)
  To: Linux ACPI; +Cc: Kenneth R. Crudup, Linux PM, LKML

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Note that the EC GPE processing need not be synchronized in
acpi_s2idle_wake() after invoking acpi_ec_dispatch_gpe(), because
that function checks the GPE status and dispatches its handler if
need be and the SCI action handler is not going to run anyway at
that point.

Moreover, it is better to drain all of the pending ACPI events
before restoring the working-state configuration of GPEs in
acpi_s2idle_restore(), because those events are likely to be related
to system wakeup, in which case they will not be relevant going
forward.

Rework the code to take these observations into account.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/acpi/sleep.c |   26 +++++++++++++++++++-------
 1 file changed, 19 insertions(+), 7 deletions(-)

Index: linux-pm/drivers/acpi/sleep.c
===================================================================
--- linux-pm.orig/drivers/acpi/sleep.c
+++ linux-pm/drivers/acpi/sleep.c
@@ -977,6 +977,16 @@ static int acpi_s2idle_prepare_late(void
 	return 0;
 }
 
+static void acpi_s2idle_sync(void)
+{
+	/*
+	 * The EC driver uses the system workqueue and an additional special
+	 * one, so those need to be flushed too.
+	 */
+	acpi_ec_flush_work();
+	acpi_os_wait_events_complete(); /* synchronize Notify handling */
+}
+
 static void acpi_s2idle_wake(void)
 {
 	/*
@@ -1001,13 +1011,8 @@ static void acpi_s2idle_wake(void)
 		 * should be missed by canceling the wakeup here.
 		 */
 		pm_system_cancel_wakeup();
-		/*
-		 * The EC driver uses the system workqueue and an additional
-		 * special one, so those need to be flushed too.
-		 */
-		acpi_os_wait_events_complete(); /* synchronize EC GPE processing */
-		acpi_ec_flush_work();
-		acpi_os_wait_events_complete(); /* synchronize Notify handling */
+
+		acpi_s2idle_sync();
 
 		rearm_wake_irq(acpi_sci_irq);
 	}
@@ -1024,6 +1029,13 @@ static void acpi_s2idle_restore_early(vo
 
 static void acpi_s2idle_restore(void)
 {
+	/*
+	 * Drain pending events before restoring the working-state configuration
+	 * of GPEs.
+	 */
+	acpi_os_wait_events_complete(); /* synchronize GPE processing */
+	acpi_s2idle_sync();
+
 	s2idle_wakeup = false;
 
 	acpi_enable_all_runtime_gpes();




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 0/2] ACPI: PM: s2idle: Fix possible suspend lock-up
  2019-11-28 22:42 [PATCH 0/2] ACPI: PM: s2idle: Fix possible suspend lock-up Rafael J. Wysocki
  2019-11-28 22:47 ` [PATCH 1/2] ACPI: EC: Rework flushing of pending work Rafael J. Wysocki
  2019-11-28 22:50 ` [PATCH 2/2] ACPI: PM: s2idle: Rework ACPI events synchronization Rafael J. Wysocki
@ 2019-11-29 20:35 ` Kenneth R. Crudup
  2019-12-01 19:54   ` Rafael J. Wysocki
  2 siblings, 1 reply; 5+ messages in thread
From: Kenneth R. Crudup @ 2019-11-29 20:35 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Linux ACPI, Linux PM, LKML


On Thu, 28 Nov 2019, Rafael J. Wysocki wrote:

> Please test this series and let me know if it works for you too.

I've tested this quite a few times against the issues reported on the Linux-PM
mailing list ("resume failures if the charger is plugged in while suspended
and the battery is at < 90%" and "spurious wakeups from the EC for non-power
events") and this patchset has fixed both of these issues (the latter is
especially surprising and welcome).

I'm pretty happy with it; it's been a while since I've had a consistently
cold laptop while suspended. Much thanks!

	-Kenny

-- 
Kenneth R. Crudup  Sr. SW Engineer, Scott County Consulting, Silicon Valley

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 0/2] ACPI: PM: s2idle: Fix possible suspend lock-up
  2019-11-29 20:35 ` [PATCH 0/2] ACPI: PM: s2idle: Fix possible suspend lock-up Kenneth R. Crudup
@ 2019-12-01 19:54   ` Rafael J. Wysocki
  0 siblings, 0 replies; 5+ messages in thread
From: Rafael J. Wysocki @ 2019-12-01 19:54 UTC (permalink / raw)
  To: Kenneth R. Crudup; +Cc: Rafael J. Wysocki, Linux ACPI, Linux PM, LKML

On Fri, Nov 29, 2019 at 9:35 PM Kenneth R. Crudup <kenny@panix.com> wrote:
>
>
> On Thu, 28 Nov 2019, Rafael J. Wysocki wrote:
>
> > Please test this series and let me know if it works for you too.
>
> I've tested this quite a few times against the issues reported on the Linux-PM
> mailing list ("resume failures if the charger is plugged in while suspended
> and the battery is at < 90%" and "spurious wakeups from the EC for non-power
> events") and this patchset has fixed both of these issues (the latter is
> especially surprising and welcome).
>
> I'm pretty happy with it; it's been a while since I've had a consistently
> cold laptop while suspended. Much thanks!

Thanks a lot for the testing and feedback, much appreciated!

I'll queue up this series for the next PM pull request.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-12-01 19:55 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-28 22:42 [PATCH 0/2] ACPI: PM: s2idle: Fix possible suspend lock-up Rafael J. Wysocki
2019-11-28 22:47 ` [PATCH 1/2] ACPI: EC: Rework flushing of pending work Rafael J. Wysocki
2019-11-28 22:50 ` [PATCH 2/2] ACPI: PM: s2idle: Rework ACPI events synchronization Rafael J. Wysocki
2019-11-29 20:35 ` [PATCH 0/2] ACPI: PM: s2idle: Fix possible suspend lock-up Kenneth R. Crudup
2019-12-01 19:54   ` Rafael J. Wysocki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).