* [RFC][PATCH] PM: Avoid losing wakeup events during suspend @ 2010-06-19 22:05 Rafael J. Wysocki 2010-06-20 5:52 ` mark gross 2010-06-20 16:28 ` Alan Stern 0 siblings, 2 replies; 75+ messages in thread From: Rafael J. Wysocki @ 2010-06-19 22:05 UTC (permalink / raw) To: linux-pm Cc: Alan Stern, Matthew Garrett, Linux Kernel Mailing List, Dmitry Torokhov, Arve Hjønnevåg, Neil Brown, mark gross Hi, One of the arguments during the suspend blockers discussion was that the mainline kernel didn't contain any mechanisms allowing it to avoid losing wakeup events during system suspend. Generally, there are two problems in that area. First, if a wakeup event occurs exactly at the same time when /sys/power/state is being written to, the even may be delivered to user space right before the freezing of it, in which case the user space consumer of the event may not be able to process it before the system is suspended. Second, if a wakeup event occurs after user space has been frozen and that event is not a wakeup interrupt, the kernel will not react to it and the system will be suspended. The following patch illustrates my idea of how these two problems may be addressed. It introduces a new global sysfs attribute, /sys/power/wakeup_count, associated with a running counter of wakeup events and a helper function, pm_wakeup_event(), that may be used by kernel subsystems to increment the wakeup events counter. /sys/power/wakeup_count may be read from or written to by user space. Reads will always succeed and return the current value of the wakeup events counter. Writes, however, will only succeed if the written number is equal to the current value of the wakeup events counter. If a write is successful, it will cause the kernel to save the current value of the wakeup events counter and to compare the saved number with the current value of the counter at certain points of the subsequent suspend (or hibernate) sequence. If the two values don't match, the suspend will be aborted just as though a wakeup interrupt happened. Reading from /sys/power/wakeup_count again will turn that mechanism off. The assumption is that there's a user space power manager that will first read from /sys/power/wakeup_count. Then it will check all user space consumers of wakeup events known to it for unprocessed events. If there are any, it will wait for them to be processed and repeat. In turn, if there are not any, it will try to write to /sys/power/wakeup_count and if the write is successful, it will write to /sys/power/state to start suspend, so if any wakeup events accur past that point, they will be noticed by the kernel and will eventually cause the suspend to be aborted. In addition to the above, the patch adds a wakeup events counter to the power member of struct device and makes these per-device wakeup event counters available via sysfs, so that it's possible to check the activity of various wakeup event sources within the kernel. To illustrate how subsystems can use pm_wakeup_event(), I added it to the PCI runtime PM wakeup-handling code. At the moment the patch only contains code changes (ie. no documentation), but I'm going to add comments etc. if people like the idea. Please tell me what you think. Rafael --- drivers/base/power/Makefile | 2 - drivers/base/power/main.c | 1 drivers/base/power/power.h | 3 + drivers/base/power/sysfs.c | 9 ++++ drivers/base/power/wakeup.c | 74 ++++++++++++++++++++++++++++++++++++++++ drivers/pci/pci-acpi.c | 2 + drivers/pci/pcie/pme/pcie_pme.c | 2 + include/linux/pm.h | 6 +++ kernel/power/hibernate.c | 14 ++++--- kernel/power/main.c | 24 ++++++++++++ kernel/power/power.h | 6 +++ kernel/power/suspend.c | 2 - 12 files changed, 138 insertions(+), 7 deletions(-) Index: linux-2.6/kernel/power/main.c =================================================================== --- linux-2.6.orig/kernel/power/main.c +++ linux-2.6/kernel/power/main.c @@ -204,6 +204,28 @@ static ssize_t state_store(struct kobjec power_attr(state); +static ssize_t wakeup_count_show(struct kobject *kobj, + struct kobj_attribute *attr, + char *buf) +{ + return sprintf(buf, "%lu\n", pm_get_wakeup_count()); +} + +static ssize_t wakeup_count_store(struct kobject *kobj, + struct kobj_attribute *attr, + const char *buf, size_t n) +{ + unsigned long val; + + if (sscanf(buf, "%lu", &val) == 1) { + if (pm_save_wakeup_count(val)) + return n; + } + return -EINVAL; +} + +power_attr(wakeup_count); + #ifdef CONFIG_PM_TRACE int pm_trace_enabled; @@ -236,6 +258,7 @@ static struct attribute * g[] = { #endif #ifdef CONFIG_PM_SLEEP &pm_async_attr.attr, + &wakeup_count_attr.attr, #ifdef CONFIG_PM_DEBUG &pm_test_attr.attr, #endif @@ -266,6 +289,7 @@ static int __init pm_init(void) int error = pm_start_workqueue(); if (error) return error; + pm_wakeup_events_init(); power_kobj = kobject_create_and_add("power", NULL); if (!power_kobj) return -ENOMEM; Index: linux-2.6/drivers/base/power/wakeup.c =================================================================== --- /dev/null +++ linux-2.6/drivers/base/power/wakeup.c @@ -0,0 +1,74 @@ + +#include <linux/device.h> +#include <linux/pm.h> + +static unsigned long event_count; +static unsigned long saved_event_count; +static bool events_check_enabled; +static spinlock_t events_lock; + +void pm_wakeup_events_init(void) +{ + spin_lock_init(&events_lock); +} + +void pm_wakeup_event(struct device *dev) +{ + unsigned long flags; + + spin_lock_irqsave(&events_lock, flags); + event_count++; + if (dev) + dev->power.wakeup_count++; + spin_unlock_irqrestore(&events_lock, flags); +} + +bool pm_check_wakeup_events(bool enable) +{ + unsigned long flags; + bool ret; + + spin_lock_irqsave(&events_lock, flags); + ret = !events_check_enabled || (event_count == saved_event_count); + events_check_enabled = enable; + spin_unlock_irqrestore(&events_lock, flags); + return ret; +} + +unsigned long pm_get_wakeup_count(void) +{ + unsigned long flags; + unsigned long count; + + spin_lock_irqsave(&events_lock, flags); + events_check_enabled = false; + count = event_count; + spin_unlock_irqrestore(&events_lock, flags); + return count; +} + +bool pm_save_wakeup_count(unsigned long count) +{ + unsigned long flags; + bool ret = false; + + spin_lock_irqsave(&events_lock, flags); + if (count == event_count) { + saved_event_count = count; + events_check_enabled = true; + ret = true; + } + spin_unlock_irqrestore(&events_lock, flags); + return ret; +} + +unsigned long pm_dev_wakeup_count(struct device *dev) +{ + unsigned long flags; + unsigned long count; + + spin_lock_irqsave(&events_lock, flags); + count = dev->power.wakeup_count; + spin_unlock_irqrestore(&events_lock, flags); + return count; +} Index: linux-2.6/include/linux/pm.h =================================================================== --- linux-2.6.orig/include/linux/pm.h +++ linux-2.6/include/linux/pm.h @@ -457,6 +457,7 @@ struct dev_pm_info { #ifdef CONFIG_PM_SLEEP struct list_head entry; struct completion completion; + unsigned long wakeup_count; #endif #ifdef CONFIG_PM_RUNTIME struct timer_list suspend_timer; @@ -552,6 +553,9 @@ extern void __suspend_report_result(cons } while (0) extern void device_pm_wait_for_dev(struct device *sub, struct device *dev); + +/* drivers/base/power/wakeup.c */ +extern void pm_wakeup_event(struct device *dev); #else /* !CONFIG_PM_SLEEP */ #define device_pm_lock() do {} while (0) @@ -565,6 +569,8 @@ static inline int dpm_suspend_start(pm_m #define suspend_report_result(fn, ret) do {} while (0) static inline void device_pm_wait_for_dev(struct device *a, struct device *b) {} + +static inline void pm_wakeup_event(struct device *dev) {} #endif /* !CONFIG_PM_SLEEP */ /* How to reorder dpm_list after device_move() */ Index: linux-2.6/drivers/base/power/Makefile =================================================================== --- linux-2.6.orig/drivers/base/power/Makefile +++ linux-2.6/drivers/base/power/Makefile @@ -1,5 +1,5 @@ obj-$(CONFIG_PM) += sysfs.o -obj-$(CONFIG_PM_SLEEP) += main.o +obj-$(CONFIG_PM_SLEEP) += main.o wakeup.o obj-$(CONFIG_PM_RUNTIME) += runtime.o obj-$(CONFIG_PM_OPS) += generic_ops.o obj-$(CONFIG_PM_TRACE_RTC) += trace.o Index: linux-2.6/drivers/base/power/main.c =================================================================== --- linux-2.6.orig/drivers/base/power/main.c +++ linux-2.6/drivers/base/power/main.c @@ -59,6 +59,7 @@ void device_pm_init(struct device *dev) { dev->power.status = DPM_ON; init_completion(&dev->power.completion); + dev->power.wakeup_count = 0; pm_runtime_init(dev); } Index: linux-2.6/kernel/power/power.h =================================================================== --- linux-2.6.orig/kernel/power/power.h +++ linux-2.6/kernel/power/power.h @@ -184,6 +184,12 @@ static inline void suspend_test_finish(c #ifdef CONFIG_PM_SLEEP /* kernel/power/main.c */ extern int pm_notifier_call_chain(unsigned long val); + +/* drivers/base/power/wakeup.c */ +extern void pm_wakeup_events_init(void); +extern bool pm_check_wakeup_events(bool enable); +extern unsigned long pm_get_wakeup_count(void); +extern bool pm_save_wakeup_count(unsigned long count); #endif #ifdef CONFIG_HIGHMEM Index: linux-2.6/kernel/power/suspend.c =================================================================== --- linux-2.6.orig/kernel/power/suspend.c +++ linux-2.6/kernel/power/suspend.c @@ -157,7 +157,7 @@ static int suspend_enter(suspend_state_t error = sysdev_suspend(PMSG_SUSPEND); if (!error) { - if (!suspend_test(TEST_CORE)) + if (!suspend_test(TEST_CORE) && pm_check_wakeup_events(false)) error = suspend_ops->enter(state); sysdev_resume(); } Index: linux-2.6/kernel/power/hibernate.c =================================================================== --- linux-2.6.orig/kernel/power/hibernate.c +++ linux-2.6/kernel/power/hibernate.c @@ -277,7 +277,7 @@ static int create_image(int platform_mod goto Enable_irqs; } - if (hibernation_test(TEST_CORE)) + if (hibernation_test(TEST_CORE) || !pm_check_wakeup_events(true)) goto Power_up; in_suspend = 1; @@ -511,14 +511,18 @@ int hibernation_platform_enter(void) local_irq_disable(); sysdev_suspend(PMSG_HIBERNATE); + if (!pm_check_wakeup_events(false)) + goto Power_up; + hibernation_ops->enter(); /* We should never get here */ while (1); - /* - * We don't need to reenable the nonboot CPUs or resume consoles, since - * the system is going to be halted anyway. - */ + Power_up: + sysdev_resume(); + local_irq_enable(); + enable_nonboot_cpus(); + Platform_finish: hibernation_ops->finish(); Index: linux-2.6/drivers/pci/pci-acpi.c =================================================================== --- linux-2.6.orig/drivers/pci/pci-acpi.c +++ linux-2.6/drivers/pci/pci-acpi.c @@ -48,6 +48,8 @@ static void pci_acpi_wake_dev(acpi_handl if (event == ACPI_NOTIFY_DEVICE_WAKE && pci_dev) { pci_check_pme_status(pci_dev); pm_runtime_resume(&pci_dev->dev); + if (device_may_wakeup(&pci_dev->dev)) + pm_wakeup_event(&pci_dev->dev); if (pci_dev->subordinate) pci_pme_wakeup_bus(pci_dev->subordinate); } Index: linux-2.6/drivers/pci/pcie/pme/pcie_pme.c =================================================================== --- linux-2.6.orig/drivers/pci/pcie/pme/pcie_pme.c +++ linux-2.6/drivers/pci/pcie/pme/pcie_pme.c @@ -147,6 +147,8 @@ static bool pcie_pme_walk_bus(struct pci /* Skip PCIe devices in case we started from a root port. */ if (!pci_is_pcie(dev) && pci_check_pme_status(dev)) { pm_request_resume(&dev->dev); + if (device_may_wakeup(&dev->dev)) + pm_wakeup_event(&dev->dev); ret = true; } Index: linux-2.6/drivers/base/power/power.h =================================================================== --- linux-2.6.orig/drivers/base/power/power.h +++ linux-2.6/drivers/base/power/power.h @@ -30,6 +30,9 @@ extern void device_pm_move_before(struct extern void device_pm_move_after(struct device *, struct device *); extern void device_pm_move_last(struct device *); +/* drivers/base/power/wakeup.c */ +extern unsigned long pm_dev_wakeup_count(struct device *dev); + #else /* !CONFIG_PM_SLEEP */ static inline void device_pm_init(struct device *dev) Index: linux-2.6/drivers/base/power/sysfs.c =================================================================== --- linux-2.6.orig/drivers/base/power/sysfs.c +++ linux-2.6/drivers/base/power/sysfs.c @@ -144,6 +144,14 @@ wake_store(struct device * dev, struct d static DEVICE_ATTR(wakeup, 0644, wake_show, wake_store); +static ssize_t wakeup_count_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + return sprintf(buf, "%lu\n", pm_dev_wakeup_count(dev)); +} + +static DEVICE_ATTR(wakeup_count, 0444, wakeup_count_show, NULL); + #ifdef CONFIG_PM_ADVANCED_DEBUG #ifdef CONFIG_PM_RUNTIME @@ -230,6 +238,7 @@ static struct attribute * power_attrs[] &dev_attr_control.attr, #endif &dev_attr_wakeup.attr, + &dev_attr_wakeup_count.attr, #ifdef CONFIG_PM_ADVANCED_DEBUG &dev_attr_async.attr, #ifdef CONFIG_PM_RUNTIME ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend 2010-06-19 22:05 [RFC][PATCH] PM: Avoid losing wakeup events during suspend Rafael J. Wysocki @ 2010-06-20 5:52 ` mark gross 2010-06-20 12:49 ` Rafael J. Wysocki 2010-06-20 16:28 ` Alan Stern 1 sibling, 1 reply; 75+ messages in thread From: mark gross @ 2010-06-20 5:52 UTC (permalink / raw) To: Rafael J. Wysocki Cc: linux-pm, Alan Stern, Matthew Garrett, Linux Kernel Mailing List, Dmitry Torokhov, Arve Hjønnevåg, Neil Brown, mark gross On Sun, Jun 20, 2010 at 12:05:35AM +0200, Rafael J. Wysocki wrote: > Hi, > > One of the arguments during the suspend blockers discussion was that the > mainline kernel didn't contain any mechanisms allowing it to avoid losing > wakeup events during system suspend. > > Generally, there are two problems in that area. First, if a wakeup event > occurs exactly at the same time when /sys/power/state is being written to, > the even may be delivered to user space right before the freezing of it, > in which case the user space consumer of the event may not be able to process yes this is racy. souldn't the wakeup event handers/driver force a user mode ACK before they stop failing suspend attempts? > it before the system is suspended. Second, if a wakeup event occurs after user > space has been frozen and that event is not a wakeup interrupt, the kernel will > not react to it and the system will be suspended. If its not a wakeup interrupt is it not fair to allow the suspend to happen even if its handler's are "in flight" at suspend time? > > The following patch illustrates my idea of how these two problems may be > addressed. It introduces a new global sysfs attribute, > /sys/power/wakeup_count, associated with a running counter of wakeup events > and a helper function, pm_wakeup_event(), that may be used by kernel subsystems > to increment the wakeup events counter. > > /sys/power/wakeup_count may be read from or written to by user space. Reads > will always succeed and return the current value of the wakeup events counter. > Writes, however, will only succeed if the written number is equal to the > current value of the wakeup events counter. If a write is successful, it will > cause the kernel to save the current value of the wakeup events counter and > to compare the saved number with the current value of the counter at certain > points of the subsequent suspend (or hibernate) sequence. If the two values > don't match, the suspend will be aborted just as though a wakeup interrupt > happened. Reading from /sys/power/wakeup_count again will turn that mechanism > off. why would you want to turn it off? > > The assumption is that there's a user space power manager that will first > read from /sys/power/wakeup_count. Then it will check all user space consumers > of wakeup events known to it for unprocessed events. If there are any, it will > wait for them to be processed and repeat. In turn, if there are not any, > it will try to write to /sys/power/wakeup_count and if the write is successful, > it will write to /sys/power/state to start suspend, so if any wakeup events > accur past that point, they will be noticed by the kernel and will eventually > cause the suspend to be aborted. > > In addition to the above, the patch adds a wakeup events counter to the > power member of struct device and makes these per-device wakeup event counters > available via sysfs, so that it's possible to check the activity of various > wakeup event sources within the kernel. > > To illustrate how subsystems can use pm_wakeup_event(), I added it to the > PCI runtime PM wakeup-handling code. > > At the moment the patch only contains code changes (ie. no documentation), > but I'm going to add comments etc. if people like the idea. > > Please tell me what you think. > > Rafael > > --- > drivers/base/power/Makefile | 2 - > drivers/base/power/main.c | 1 > drivers/base/power/power.h | 3 + > drivers/base/power/sysfs.c | 9 ++++ > drivers/base/power/wakeup.c | 74 ++++++++++++++++++++++++++++++++++++++++ > drivers/pci/pci-acpi.c | 2 + > drivers/pci/pcie/pme/pcie_pme.c | 2 + > include/linux/pm.h | 6 +++ > kernel/power/hibernate.c | 14 ++++--- > kernel/power/main.c | 24 ++++++++++++ > kernel/power/power.h | 6 +++ > kernel/power/suspend.c | 2 - > 12 files changed, 138 insertions(+), 7 deletions(-) > > Index: linux-2.6/kernel/power/main.c > =================================================================== > --- linux-2.6.orig/kernel/power/main.c > +++ linux-2.6/kernel/power/main.c > @@ -204,6 +204,28 @@ static ssize_t state_store(struct kobjec > > power_attr(state); > > +static ssize_t wakeup_count_show(struct kobject *kobj, > + struct kobj_attribute *attr, > + char *buf) > +{ > + return sprintf(buf, "%lu\n", pm_get_wakeup_count()); > +} > + > +static ssize_t wakeup_count_store(struct kobject *kobj, > + struct kobj_attribute *attr, > + const char *buf, size_t n) > +{ > + unsigned long val; > + > + if (sscanf(buf, "%lu", &val) == 1) { > + if (pm_save_wakeup_count(val)) > + return n; > + } > + return -EINVAL; > +} > + > +power_attr(wakeup_count); > + > #ifdef CONFIG_PM_TRACE > int pm_trace_enabled; > > @@ -236,6 +258,7 @@ static struct attribute * g[] = { > #endif > #ifdef CONFIG_PM_SLEEP > &pm_async_attr.attr, > + &wakeup_count_attr.attr, > #ifdef CONFIG_PM_DEBUG > &pm_test_attr.attr, > #endif > @@ -266,6 +289,7 @@ static int __init pm_init(void) > int error = pm_start_workqueue(); > if (error) > return error; > + pm_wakeup_events_init(); > power_kobj = kobject_create_and_add("power", NULL); > if (!power_kobj) > return -ENOMEM; > Index: linux-2.6/drivers/base/power/wakeup.c > =================================================================== > --- /dev/null > +++ linux-2.6/drivers/base/power/wakeup.c > @@ -0,0 +1,74 @@ > + > +#include <linux/device.h> > +#include <linux/pm.h> > + > +static unsigned long event_count; > +static unsigned long saved_event_count; what about over flow issues? > +static bool events_check_enabled; > +static spinlock_t events_lock; > + > +void pm_wakeup_events_init(void) > +{ > + spin_lock_init(&events_lock); > +} > + > +void pm_wakeup_event(struct device *dev) > +{ > + unsigned long flags; > + > + spin_lock_irqsave(&events_lock, flags); > + event_count++; should event_count be an atomic type so we can not bother with taking the evnets_lock? > + if (dev) > + dev->power.wakeup_count++; > + spin_unlock_irqrestore(&events_lock, flags); > +} > + > +bool pm_check_wakeup_events(bool enable) > +{ > + unsigned long flags; > + bool ret; > + > + spin_lock_irqsave(&events_lock, flags); > + ret = !events_check_enabled || (event_count == saved_event_count); I'm not getting the events_check_enbled flag yet. > + events_check_enabled = enable; I'm not sure if this is the right thing depending on all the different ways the boolians are interacting with eachother. Which is a red flag to me. This code is confusing. I'll look at it some more when I'm fresh tomorrow. --mgross > + spin_unlock_irqrestore(&events_lock, flags); > + return ret; > +} > + > +unsigned long pm_get_wakeup_count(void) > +{ > + unsigned long flags; > + unsigned long count; > + > + spin_lock_irqsave(&events_lock, flags); > + events_check_enabled = false; > + count = event_count; > + spin_unlock_irqrestore(&events_lock, flags); > + return count; > +} > + > +bool pm_save_wakeup_count(unsigned long count) > +{ > + unsigned long flags; > + bool ret = false; > + > + spin_lock_irqsave(&events_lock, flags); > + if (count == event_count) { > + saved_event_count = count; > + events_check_enabled = true; > + ret = true; > + } > + spin_unlock_irqrestore(&events_lock, flags); > + return ret; > +} > + > +unsigned long pm_dev_wakeup_count(struct device *dev) > +{ > + unsigned long flags; > + unsigned long count; > + > + spin_lock_irqsave(&events_lock, flags); > + count = dev->power.wakeup_count; > + spin_unlock_irqrestore(&events_lock, flags); > + return count; > +} > Index: linux-2.6/include/linux/pm.h > =================================================================== > --- linux-2.6.orig/include/linux/pm.h > +++ linux-2.6/include/linux/pm.h > @@ -457,6 +457,7 @@ struct dev_pm_info { > #ifdef CONFIG_PM_SLEEP > struct list_head entry; > struct completion completion; > + unsigned long wakeup_count; > #endif > #ifdef CONFIG_PM_RUNTIME > struct timer_list suspend_timer; > @@ -552,6 +553,9 @@ extern void __suspend_report_result(cons > } while (0) > > extern void device_pm_wait_for_dev(struct device *sub, struct device *dev); > + > +/* drivers/base/power/wakeup.c */ > +extern void pm_wakeup_event(struct device *dev); > #else /* !CONFIG_PM_SLEEP */ > > #define device_pm_lock() do {} while (0) > @@ -565,6 +569,8 @@ static inline int dpm_suspend_start(pm_m > #define suspend_report_result(fn, ret) do {} while (0) > > static inline void device_pm_wait_for_dev(struct device *a, struct device *b) {} > + > +static inline void pm_wakeup_event(struct device *dev) {} > #endif /* !CONFIG_PM_SLEEP */ > > /* How to reorder dpm_list after device_move() */ > Index: linux-2.6/drivers/base/power/Makefile > =================================================================== > --- linux-2.6.orig/drivers/base/power/Makefile > +++ linux-2.6/drivers/base/power/Makefile > @@ -1,5 +1,5 @@ > obj-$(CONFIG_PM) += sysfs.o > -obj-$(CONFIG_PM_SLEEP) += main.o > +obj-$(CONFIG_PM_SLEEP) += main.o wakeup.o > obj-$(CONFIG_PM_RUNTIME) += runtime.o > obj-$(CONFIG_PM_OPS) += generic_ops.o > obj-$(CONFIG_PM_TRACE_RTC) += trace.o > Index: linux-2.6/drivers/base/power/main.c > =================================================================== > --- linux-2.6.orig/drivers/base/power/main.c > +++ linux-2.6/drivers/base/power/main.c > @@ -59,6 +59,7 @@ void device_pm_init(struct device *dev) > { > dev->power.status = DPM_ON; > init_completion(&dev->power.completion); > + dev->power.wakeup_count = 0; > pm_runtime_init(dev); > } > > Index: linux-2.6/kernel/power/power.h > =================================================================== > --- linux-2.6.orig/kernel/power/power.h > +++ linux-2.6/kernel/power/power.h > @@ -184,6 +184,12 @@ static inline void suspend_test_finish(c > #ifdef CONFIG_PM_SLEEP > /* kernel/power/main.c */ > extern int pm_notifier_call_chain(unsigned long val); > + > +/* drivers/base/power/wakeup.c */ > +extern void pm_wakeup_events_init(void); > +extern bool pm_check_wakeup_events(bool enable); > +extern unsigned long pm_get_wakeup_count(void); > +extern bool pm_save_wakeup_count(unsigned long count); > #endif > > #ifdef CONFIG_HIGHMEM > Index: linux-2.6/kernel/power/suspend.c > =================================================================== > --- linux-2.6.orig/kernel/power/suspend.c > +++ linux-2.6/kernel/power/suspend.c > @@ -157,7 +157,7 @@ static int suspend_enter(suspend_state_t > > error = sysdev_suspend(PMSG_SUSPEND); > if (!error) { > - if (!suspend_test(TEST_CORE)) > + if (!suspend_test(TEST_CORE) && pm_check_wakeup_events(false)) > error = suspend_ops->enter(state); > sysdev_resume(); > } > Index: linux-2.6/kernel/power/hibernate.c > =================================================================== > --- linux-2.6.orig/kernel/power/hibernate.c > +++ linux-2.6/kernel/power/hibernate.c > @@ -277,7 +277,7 @@ static int create_image(int platform_mod > goto Enable_irqs; > } > > - if (hibernation_test(TEST_CORE)) > + if (hibernation_test(TEST_CORE) || !pm_check_wakeup_events(true)) > goto Power_up; > > in_suspend = 1; > @@ -511,14 +511,18 @@ int hibernation_platform_enter(void) > > local_irq_disable(); > sysdev_suspend(PMSG_HIBERNATE); > + if (!pm_check_wakeup_events(false)) > + goto Power_up; > + > hibernation_ops->enter(); > /* We should never get here */ > while (1); > > - /* > - * We don't need to reenable the nonboot CPUs or resume consoles, since > - * the system is going to be halted anyway. > - */ > + Power_up: > + sysdev_resume(); > + local_irq_enable(); > + enable_nonboot_cpus(); > + > Platform_finish: > hibernation_ops->finish(); > > Index: linux-2.6/drivers/pci/pci-acpi.c > =================================================================== > --- linux-2.6.orig/drivers/pci/pci-acpi.c > +++ linux-2.6/drivers/pci/pci-acpi.c > @@ -48,6 +48,8 @@ static void pci_acpi_wake_dev(acpi_handl > if (event == ACPI_NOTIFY_DEVICE_WAKE && pci_dev) { > pci_check_pme_status(pci_dev); > pm_runtime_resume(&pci_dev->dev); > + if (device_may_wakeup(&pci_dev->dev)) > + pm_wakeup_event(&pci_dev->dev); > if (pci_dev->subordinate) > pci_pme_wakeup_bus(pci_dev->subordinate); > } > Index: linux-2.6/drivers/pci/pcie/pme/pcie_pme.c > =================================================================== > --- linux-2.6.orig/drivers/pci/pcie/pme/pcie_pme.c > +++ linux-2.6/drivers/pci/pcie/pme/pcie_pme.c > @@ -147,6 +147,8 @@ static bool pcie_pme_walk_bus(struct pci > /* Skip PCIe devices in case we started from a root port. */ > if (!pci_is_pcie(dev) && pci_check_pme_status(dev)) { > pm_request_resume(&dev->dev); > + if (device_may_wakeup(&dev->dev)) > + pm_wakeup_event(&dev->dev); > ret = true; > } > > Index: linux-2.6/drivers/base/power/power.h > =================================================================== > --- linux-2.6.orig/drivers/base/power/power.h > +++ linux-2.6/drivers/base/power/power.h > @@ -30,6 +30,9 @@ extern void device_pm_move_before(struct > extern void device_pm_move_after(struct device *, struct device *); > extern void device_pm_move_last(struct device *); > > +/* drivers/base/power/wakeup.c */ > +extern unsigned long pm_dev_wakeup_count(struct device *dev); > + > #else /* !CONFIG_PM_SLEEP */ > > static inline void device_pm_init(struct device *dev) > Index: linux-2.6/drivers/base/power/sysfs.c > =================================================================== > --- linux-2.6.orig/drivers/base/power/sysfs.c > +++ linux-2.6/drivers/base/power/sysfs.c > @@ -144,6 +144,14 @@ wake_store(struct device * dev, struct d > > static DEVICE_ATTR(wakeup, 0644, wake_show, wake_store); > > +static ssize_t wakeup_count_show(struct device *dev, > + struct device_attribute *attr, char *buf) > +{ > + return sprintf(buf, "%lu\n", pm_dev_wakeup_count(dev)); > +} > + > +static DEVICE_ATTR(wakeup_count, 0444, wakeup_count_show, NULL); > + > #ifdef CONFIG_PM_ADVANCED_DEBUG > #ifdef CONFIG_PM_RUNTIME > > @@ -230,6 +238,7 @@ static struct attribute * power_attrs[] > &dev_attr_control.attr, > #endif > &dev_attr_wakeup.attr, > + &dev_attr_wakeup_count.attr, > #ifdef CONFIG_PM_ADVANCED_DEBUG > &dev_attr_async.attr, > #ifdef CONFIG_PM_RUNTIME > ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend 2010-06-20 5:52 ` mark gross @ 2010-06-20 12:49 ` Rafael J. Wysocki 2010-06-20 23:13 ` mark gross 0 siblings, 1 reply; 75+ messages in thread From: Rafael J. Wysocki @ 2010-06-20 12:49 UTC (permalink / raw) To: markgross Cc: linux-pm, Alan Stern, Matthew Garrett, Linux Kernel Mailing List, Dmitry Torokhov, Arve Hjønnevåg, Neil Brown, mark gross On Sunday, June 20, 2010, mark gross wrote: > On Sun, Jun 20, 2010 at 12:05:35AM +0200, Rafael J. Wysocki wrote: > > Hi, > > > > One of the arguments during the suspend blockers discussion was that the > > mainline kernel didn't contain any mechanisms allowing it to avoid losing > > wakeup events during system suspend. > > > > Generally, there are two problems in that area. First, if a wakeup event > > occurs exactly at the same time when /sys/power/state is being written to, > > the even may be delivered to user space right before the freezing of it, > > in which case the user space consumer of the event may not be able to process > yes this is racy. souldn't the wakeup event handers/driver force a user > mode ACK before they stop failing suspend attempts? I'm not sure what you mean. There are wake-up events without any user space consumers, like wake-on-LAN. > > it before the system is suspended. Second, if a wakeup event occurs after user > > space has been frozen and that event is not a wakeup interrupt, the kernel will > > not react to it and the system will be suspended. > > If its not a wakeup interrupt is it not fair to allow the suspend to > happen even if its handler's are "in flight" at suspend time? A wakeup event need not be an interrupt, or at least there are interrupts that have other meanings, like ACPI SCI. > > The following patch illustrates my idea of how these two problems may be > > addressed. It introduces a new global sysfs attribute, > > /sys/power/wakeup_count, associated with a running counter of wakeup events > > and a helper function, pm_wakeup_event(), that may be used by kernel subsystems > > to increment the wakeup events counter. > > > > /sys/power/wakeup_count may be read from or written to by user space. Reads > > will always succeed and return the current value of the wakeup events counter. > > Writes, however, will only succeed if the written number is equal to the > > current value of the wakeup events counter. If a write is successful, it will > > cause the kernel to save the current value of the wakeup events counter and > > to compare the saved number with the current value of the counter at certain > > points of the subsequent suspend (or hibernate) sequence. If the two values > > don't match, the suspend will be aborted just as though a wakeup interrupt > > happened. Reading from /sys/power/wakeup_count again will turn that mechanism > > off. > > why would you want to turn it off? In some cases I may want to suspend/hibernate losing some wakeup events, like for example in the case of emergency hibernation on critical battery. > > The assumption is that there's a user space power manager that will first > > read from /sys/power/wakeup_count. Then it will check all user space consumers > > of wakeup events known to it for unprocessed events. If there are any, it will > > wait for them to be processed and repeat. In turn, if there are not any, > > it will try to write to /sys/power/wakeup_count and if the write is successful, > > it will write to /sys/power/state to start suspend, so if any wakeup events > > accur past that point, they will be noticed by the kernel and will eventually > > cause the suspend to be aborted. > > > > In addition to the above, the patch adds a wakeup events counter to the > > power member of struct device and makes these per-device wakeup event counters > > available via sysfs, so that it's possible to check the activity of various > > wakeup event sources within the kernel. > > > > To illustrate how subsystems can use pm_wakeup_event(), I added it to the > > PCI runtime PM wakeup-handling code. > > > > At the moment the patch only contains code changes (ie. no documentation), > > but I'm going to add comments etc. if people like the idea. > > > > Please tell me what you think. > > > > Rafael > > > > --- > > drivers/base/power/Makefile | 2 - > > drivers/base/power/main.c | 1 > > drivers/base/power/power.h | 3 + > > drivers/base/power/sysfs.c | 9 ++++ > > drivers/base/power/wakeup.c | 74 ++++++++++++++++++++++++++++++++++++++++ > > drivers/pci/pci-acpi.c | 2 + > > drivers/pci/pcie/pme/pcie_pme.c | 2 + > > include/linux/pm.h | 6 +++ > > kernel/power/hibernate.c | 14 ++++--- > > kernel/power/main.c | 24 ++++++++++++ > > kernel/power/power.h | 6 +++ > > kernel/power/suspend.c | 2 - > > 12 files changed, 138 insertions(+), 7 deletions(-) > > > > Index: linux-2.6/kernel/power/main.c > > =================================================================== > > --- linux-2.6.orig/kernel/power/main.c > > +++ linux-2.6/kernel/power/main.c > > @@ -204,6 +204,28 @@ static ssize_t state_store(struct kobjec > > > > power_attr(state); > > > > +static ssize_t wakeup_count_show(struct kobject *kobj, > > + struct kobj_attribute *attr, > > + char *buf) > > +{ > > + return sprintf(buf, "%lu\n", pm_get_wakeup_count()); > > +} > > + > > +static ssize_t wakeup_count_store(struct kobject *kobj, > > + struct kobj_attribute *attr, > > + const char *buf, size_t n) > > +{ > > + unsigned long val; > > + > > + if (sscanf(buf, "%lu", &val) == 1) { > > + if (pm_save_wakeup_count(val)) > > + return n; > > + } > > + return -EINVAL; > > +} > > + > > +power_attr(wakeup_count); > > + > > #ifdef CONFIG_PM_TRACE > > int pm_trace_enabled; > > > > @@ -236,6 +258,7 @@ static struct attribute * g[] = { > > #endif > > #ifdef CONFIG_PM_SLEEP > > &pm_async_attr.attr, > > + &wakeup_count_attr.attr, > > #ifdef CONFIG_PM_DEBUG > > &pm_test_attr.attr, > > #endif > > @@ -266,6 +289,7 @@ static int __init pm_init(void) > > int error = pm_start_workqueue(); > > if (error) > > return error; > > + pm_wakeup_events_init(); > > power_kobj = kobject_create_and_add("power", NULL); > > if (!power_kobj) > > return -ENOMEM; > > Index: linux-2.6/drivers/base/power/wakeup.c > > =================================================================== > > --- /dev/null > > +++ linux-2.6/drivers/base/power/wakeup.c > > @@ -0,0 +1,74 @@ > > + > > +#include <linux/device.h> > > +#include <linux/pm.h> > > + > > +static unsigned long event_count; > > +static unsigned long saved_event_count; > > what about over flow issues? There's no issue AFAICS. It only matters if the value is different from the previous one after incrementation. > > +static bool events_check_enabled; > > +static spinlock_t events_lock; > > + > > +void pm_wakeup_events_init(void) > > +{ > > + spin_lock_init(&events_lock); > > +} > > + > > +void pm_wakeup_event(struct device *dev) > > +{ > > + unsigned long flags; > > + > > + spin_lock_irqsave(&events_lock, flags); > > + event_count++; > should event_count be an atomic type so we can not bother with taking > the evnets_lock? The lock is there, because two counters are incremented at the same time. Also, in some other places the counter is accessed along with the enable flag, so there are two operations that need to be done in the same critical section. > > + if (dev) > > + dev->power.wakeup_count++; > > + spin_unlock_irqrestore(&events_lock, flags); > > +} > > + > > +bool pm_check_wakeup_events(bool enable) > > +{ > > + unsigned long flags; > > + bool ret; > > + > > + spin_lock_irqsave(&events_lock, flags); > > + ret = !events_check_enabled || (event_count == saved_event_count); > I'm not getting the events_check_enbled flag yet. I tells the PM core whether or not to check wakeup events during suspend. The checking is only enabled after a successful write to /sys/power/wakeup_count, it is disabled by reads and by the final check right before the system enters suspend. [That's because the "saved" value from the previous suspend should not be used for checking wakeup events during the next one.] > > + events_check_enabled = enable; > I'm not sure if this is the right thing depending on all the different > ways the boolians are interacting with eachother. > > Which is a red flag to me. This code is confusing. Well, I could use check_wakeup_events() and check_and_disable_wakeup_events() (the latter is only necessary for the final check), but I'm not sure if that's going to be better. > I'll look at it some more when I'm fresh tomorrow. Thanks for the comments. Rafael ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend 2010-06-20 12:49 ` Rafael J. Wysocki @ 2010-06-20 23:13 ` mark gross 0 siblings, 0 replies; 75+ messages in thread From: mark gross @ 2010-06-20 23:13 UTC (permalink / raw) To: Rafael J. Wysocki Cc: markgross, linux-pm, Alan Stern, Matthew Garrett, Linux Kernel Mailing List, Dmitry Torokhov, Arve Hjønnevåg, Neil Brown, mark gross On Sun, Jun 20, 2010 at 02:49:14PM +0200, Rafael J. Wysocki wrote: > On Sunday, June 20, 2010, mark gross wrote: > > On Sun, Jun 20, 2010 at 12:05:35AM +0200, Rafael J. Wysocki wrote: > > > Hi, > > > > > > One of the arguments during the suspend blockers discussion was that the > > > mainline kernel didn't contain any mechanisms allowing it to avoid losing > > > wakeup events during system suspend. > > > > > > Generally, there are two problems in that area. First, if a wakeup event > > > occurs exactly at the same time when /sys/power/state is being written to, > > > the even may be delivered to user space right before the freezing of it, > > > in which case the user space consumer of the event may not be able to process > > yes this is racy. souldn't the wakeup event handers/driver force a user > > mode ACK before they stop failing suspend attempts? > > I'm not sure what you mean. There are wake-up events without any user space > consumers, like wake-on-LAN. I was thinking about the incoming call scenario. The 3G modem starts ringing and the we don't want the suspend to happen again until user mode ACKs the call (answer or ignore) however; you are right about events without user mode consumers. I forgot about those. > > > it before the system is suspended. Second, if a wakeup event occurs after user > > > space has been frozen and that event is not a wakeup interrupt, the kernel will > > > not react to it and the system will be suspended. > > > > If its not a wakeup interrupt is it not fair to allow the suspend to > > happen even if its handler's are "in flight" at suspend time? > > A wakeup event need not be an interrupt, or at least there are interrupts that > have other meanings, like ACPI SCI. true, but who desides if such a thing is a wakeup event? And how does that polocy get generalized in a platform portable way? (I don't think this is an issue for this current patch, its just a nagging issue to me in general with all this PM stuff) > > > The following patch illustrates my idea of how these two problems may be > > > addressed. It introduces a new global sysfs attribute, > > > /sys/power/wakeup_count, associated with a running counter of wakeup events > > > and a helper function, pm_wakeup_event(), that may be used by kernel subsystems > > > to increment the wakeup events counter. > > > > > > /sys/power/wakeup_count may be read from or written to by user space. Reads > > > will always succeed and return the current value of the wakeup events counter. > > > Writes, however, will only succeed if the written number is equal to the > > > current value of the wakeup events counter. If a write is successful, it will > > > cause the kernel to save the current value of the wakeup events counter and > > > to compare the saved number with the current value of the counter at certain > > > points of the subsequent suspend (or hibernate) sequence. If the two values > > > don't match, the suspend will be aborted just as though a wakeup interrupt > > > happened. Reading from /sys/power/wakeup_count again will turn that mechanism > > > off. > > > > why would you want to turn it off? > > In some cases I may want to suspend/hibernate losing some wakeup events, > like for example in the case of emergency hibernation on critical battery. Or a user directed explicit suspend request. your right. > > > The assumption is that there's a user space power manager that will first > > > read from /sys/power/wakeup_count. Then it will check all user space consumers > > > of wakeup events known to it for unprocessed events. If there are any, it will > > > wait for them to be processed and repeat. In turn, if there are not any, > > > it will try to write to /sys/power/wakeup_count and if the write is successful, > > > it will write to /sys/power/state to start suspend, so if any wakeup events > > > accur past that point, they will be noticed by the kernel and will eventually > > > cause the suspend to be aborted. > > > > > > In addition to the above, the patch adds a wakeup events counter to the > > > power member of struct device and makes these per-device wakeup event counters > > > available via sysfs, so that it's possible to check the activity of various > > > wakeup event sources within the kernel. > > > > > > To illustrate how subsystems can use pm_wakeup_event(), I added it to the > > > PCI runtime PM wakeup-handling code. > > > > > > At the moment the patch only contains code changes (ie. no documentation), > > > but I'm going to add comments etc. if people like the idea. > > > > > > Please tell me what you think. > > > > > > Rafael > > > > > > --- > > > drivers/base/power/Makefile | 2 - > > > drivers/base/power/main.c | 1 > > > drivers/base/power/power.h | 3 + > > > drivers/base/power/sysfs.c | 9 ++++ > > > drivers/base/power/wakeup.c | 74 ++++++++++++++++++++++++++++++++++++++++ > > > drivers/pci/pci-acpi.c | 2 + > > > drivers/pci/pcie/pme/pcie_pme.c | 2 + > > > include/linux/pm.h | 6 +++ > > > kernel/power/hibernate.c | 14 ++++--- > > > kernel/power/main.c | 24 ++++++++++++ > > > kernel/power/power.h | 6 +++ > > > kernel/power/suspend.c | 2 - > > > 12 files changed, 138 insertions(+), 7 deletions(-) > > > > > > Index: linux-2.6/kernel/power/main.c > > > =================================================================== > > > --- linux-2.6.orig/kernel/power/main.c > > > +++ linux-2.6/kernel/power/main.c > > > @@ -204,6 +204,28 @@ static ssize_t state_store(struct kobjec > > > > > > power_attr(state); > > > > > > +static ssize_t wakeup_count_show(struct kobject *kobj, > > > + struct kobj_attribute *attr, > > > + char *buf) > > > +{ > > > + return sprintf(buf, "%lu\n", pm_get_wakeup_count()); > > > +} > > > + > > > +static ssize_t wakeup_count_store(struct kobject *kobj, > > > + struct kobj_attribute *attr, > > > + const char *buf, size_t n) > > > +{ > > > + unsigned long val; > > > + > > > + if (sscanf(buf, "%lu", &val) == 1) { > > > + if (pm_save_wakeup_count(val)) > > > + return n; > > > + } > > > + return -EINVAL; > > > +} > > > + > > > +power_attr(wakeup_count); > > > + > > > #ifdef CONFIG_PM_TRACE > > > int pm_trace_enabled; > > > > > > @@ -236,6 +258,7 @@ static struct attribute * g[] = { > > > #endif > > > #ifdef CONFIG_PM_SLEEP > > > &pm_async_attr.attr, > > > + &wakeup_count_attr.attr, > > > #ifdef CONFIG_PM_DEBUG > > > &pm_test_attr.attr, > > > #endif > > > @@ -266,6 +289,7 @@ static int __init pm_init(void) > > > int error = pm_start_workqueue(); > > > if (error) > > > return error; > > > + pm_wakeup_events_init(); > > > power_kobj = kobject_create_and_add("power", NULL); > > > if (!power_kobj) > > > return -ENOMEM; > > > Index: linux-2.6/drivers/base/power/wakeup.c > > > =================================================================== > > > --- /dev/null > > > +++ linux-2.6/drivers/base/power/wakeup.c > > > @@ -0,0 +1,74 @@ > > > + > > > +#include <linux/device.h> > > > +#include <linux/pm.h> > > > + > > > +static unsigned long event_count; > > > +static unsigned long saved_event_count; > > > > what about over flow issues? > > There's no issue AFAICS. It only matters if the value is different from the > previous one after incrementation. > > > > +static bool events_check_enabled; > > > +static spinlock_t events_lock; > > > + > > > +void pm_wakeup_events_init(void) > > > +{ > > > + spin_lock_init(&events_lock); > > > +} > > > + > > > +void pm_wakeup_event(struct device *dev) > > > +{ > > > + unsigned long flags; > > > + > > > + spin_lock_irqsave(&events_lock, flags); > > > + event_count++; > > should event_count be an atomic type so we can not bother with taking > > the evnets_lock? > > The lock is there, because two counters are incremented at the same time. > Also, in some other places the counter is accessed along with the enable > flag, so there are two operations that need to be done in the same critical > section. ok, I was thinking there is only one counter that needs locking, and one global value writen too from user mode that didn't need locking. > > > > + if (dev) > > > + dev->power.wakeup_count++; > > > + spin_unlock_irqrestore(&events_lock, flags); > > > +} > > > + > > > +bool pm_check_wakeup_events(bool enable) > > > +{ > > > + unsigned long flags; > > > + bool ret; > > > + > > > + spin_lock_irqsave(&events_lock, flags); > > > + ret = !events_check_enabled || (event_count == saved_event_count); > > I'm not getting the events_check_enbled flag yet. > > I tells the PM core whether or not to check wakeup events during suspend. > The checking is only enabled after a successful write to > /sys/power/wakeup_count, it is disabled by reads and by the final check right > before the system enters suspend. [That's because the "saved" value from > the previous suspend should not be used for checking wakeup events during > the next one.] > > > > + events_check_enabled = enable; > > I'm not sure if this is the right thing depending on all the different > > ways the boolians are interacting with eachother. > > > > Which is a red flag to me. This code is confusing. > > Well, I could use check_wakeup_events() and > check_and_disable_wakeup_events() (the latter is only necessary for the final > check), but I'm not sure if that's going to be better. I'm not sure either what you have is growing on me anyway. > > I'll look at it some more when I'm fresh tomorrow. > > Thanks for the comments. I still need to look more at the patch. --mgross ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend 2010-06-19 22:05 [RFC][PATCH] PM: Avoid losing wakeup events during suspend Rafael J. Wysocki 2010-06-20 5:52 ` mark gross @ 2010-06-20 16:28 ` Alan Stern 2010-06-20 21:50 ` Rafael J. Wysocki 2010-06-20 22:58 ` mark gross 1 sibling, 2 replies; 75+ messages in thread From: Alan Stern @ 2010-06-20 16:28 UTC (permalink / raw) To: Rafael J. Wysocki Cc: linux-pm, Matthew Garrett, Linux Kernel Mailing List, Dmitry Torokhov, Arve Hjønnevåg, Neil Brown, mark gross On Sun, 20 Jun 2010, Rafael J. Wysocki wrote: > Hi, > > One of the arguments during the suspend blockers discussion was that the > mainline kernel didn't contain any mechanisms allowing it to avoid losing > wakeup events during system suspend. > > Generally, there are two problems in that area. First, if a wakeup event > occurs exactly at the same time when /sys/power/state is being written to, > the even may be delivered to user space right before the freezing of it, > in which case the user space consumer of the event may not be able to process > it before the system is suspended. Indeed, the same problem arises if the event isn't delivered to userspace until after userspace is frozen. Of course, the underlying issue here is that the kernel has no direct way to know when userspace has finished processing an event. Userspace would have to tell it, which generally would mean rewriting some large number of user programs. > Second, if a wakeup event occurs after user > space has been frozen and that event is not a wakeup interrupt, the kernel will > not react to it and the system will be suspended. I don't quite understand what you mean here. "Reacting" to an event involves more than one action. The kernel has to tell the hardware to stop generating the wakeup signal, and it has to handle the event somehow. If the kernel doesn't tell the hardware to stop generating the wakeup signal, the signal will continue to be active until the system goes to sleep. At that point it will cause the system to wake up immediately, so there won't be any problem. The real problem arises when the hardware stops generating the wakeup signal but the kernel suspends before it finishes handling the event. For example, an interrupt handler might receive the event and start processing it by calling pm_request_resume() -- but if the pm workqueue thread is already frozen then the processing won't finish until something else wakes the system up. (IMO this is a potential bug which could be fixed without too much effort.) > The following patch illustrates my idea of how these two problems may be > addressed. It introduces a new global sysfs attribute, > /sys/power/wakeup_count, associated with a running counter of wakeup events > and a helper function, pm_wakeup_event(), that may be used by kernel subsystems > to increment the wakeup events counter. In what way is this better than suspend blockers? > /sys/power/wakeup_count may be read from or written to by user space. Reads > will always succeed and return the current value of the wakeup events counter. > Writes, however, will only succeed if the written number is equal to the > current value of the wakeup events counter. If a write is successful, it will > cause the kernel to save the current value of the wakeup events counter and > to compare the saved number with the current value of the counter at certain > points of the subsequent suspend (or hibernate) sequence. If the two values > don't match, the suspend will be aborted just as though a wakeup interrupt > happened. Reading from /sys/power/wakeup_count again will turn that mechanism > off. > > The assumption is that there's a user space power manager that will first > read from /sys/power/wakeup_count. Then it will check all user space consumers > of wakeup events known to it for unprocessed events. What happens if an event arrives just before you read /sys/power/wakeup_count, but the userspace consumer doesn't realize there is a new unprocessed event until after the power manager checks it? Your plan is missing a critical step: the "handoff" whereby responsibility for handling an event passes from the kernel to userspace. With suspend blockers, this handoff occurs when an event queue is emptied and its associate suspend blocker is deactivated. Or with some kinds of events for which the Android people have not written an explicit handoff, it occurs when a timer expires (timed suspend blockers). > If there are any, it will > wait for them to be processed and repeat. In turn, if there are not any, > it will try to write to /sys/power/wakeup_count and if the write is successful, > it will write to /sys/power/state to start suspend, so if any wakeup events > accur past that point, they will be noticed by the kernel and will eventually > cause the suspend to be aborted. This shares with the other alternatives posted recently the need for a central power-manager process. And like in-kernel suspend blockers, it requires changes to wakeup-capable drivers (the wakeup-events counter has to be incremented). One advantage of the suspend-blocker approach is that it essentially uses a single tool to handle both kinds of races (event not fully handled by the kernel, or event not fully handled by userspace). Things aren't quite this simple, because of the need for a special API to implement userspace suspend blockers, but this does avoid the need for a power-manager process. > In addition to the above, the patch adds a wakeup events counter to the > power member of struct device and makes these per-device wakeup event counters > available via sysfs, so that it's possible to check the activity of various > wakeup event sources within the kernel. > > To illustrate how subsystems can use pm_wakeup_event(), I added it to the > PCI runtime PM wakeup-handling code. > > At the moment the patch only contains code changes (ie. no documentation), > but I'm going to add comments etc. if people like the idea. > > Please tell me what you think. While this isn't a bad idea, I don't see how it is superior to the other alternatives that have been proposed. Alan Stern ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend 2010-06-20 16:28 ` Alan Stern @ 2010-06-20 21:50 ` Rafael J. Wysocki 2010-06-21 2:23 ` Alan Stern 2010-06-20 22:58 ` mark gross 1 sibling, 1 reply; 75+ messages in thread From: Rafael J. Wysocki @ 2010-06-20 21:50 UTC (permalink / raw) To: Alan Stern Cc: linux-pm, Matthew Garrett, Linux Kernel Mailing List, Dmitry Torokhov, Arve Hjønnevåg, Neil Brown, mark gross On Sunday, June 20, 2010, Alan Stern wrote: > On Sun, 20 Jun 2010, Rafael J. Wysocki wrote: > > > Hi, > > > > One of the arguments during the suspend blockers discussion was that the > > mainline kernel didn't contain any mechanisms allowing it to avoid losing > > wakeup events during system suspend. > > > > Generally, there are two problems in that area. First, if a wakeup event > > occurs exactly at the same time when /sys/power/state is being written to, > > the even may be delivered to user space right before the freezing of it, > > in which case the user space consumer of the event may not be able to process > > it before the system is suspended. > > Indeed, the same problem arises if the event isn't delivered to > userspace until after userspace is frozen. In that case the kernel should abort the suspend so that the event can be delivered to the user space. > Of course, the underlying issue here is that the kernel has no direct way > to know when userspace has finished processing an event. Userspace would > have to tell it, which generally would mean rewriting some large number of user > programs. I'm not sure of that. If the kernel doesn't initiate suspend, it doesn't really need to know whether or not user space has already consumed the event. > > Second, if a wakeup event occurs after user > > space has been frozen and that event is not a wakeup interrupt, the kernel will > > not react to it and the system will be suspended. > > I don't quite understand what you mean here. "Reacting" to an event > involves more than one action. The kernel has to tell the hardware to > stop generating the wakeup signal, and it has to handle the event > somehow. Yes. I meant that the event wouldn't cause the suspend to be aborted. > If the kernel doesn't tell the hardware to stop generating the wakeup > signal, the signal will continue to be active until the system goes to > sleep. At that point it will cause the system to wake up immediately, > so there won't be any problem. > > The real problem arises when the hardware stops generating the wakeup > signal but the kernel suspends before it finishes handling the event. > For example, an interrupt handler might receive the event and start > processing it by calling pm_request_resume() -- but if the pm workqueue > thread is already frozen then the processing won't finish until > something else wakes the system up. (IMO this is a potential bug which > could be fixed without too much effort.) That's why I put pm_wakeup_event() into the PCI runtime wakeup code, which doesn't run from the PM workqueue. > > The following patch illustrates my idea of how these two problems may be > > addressed. It introduces a new global sysfs attribute, > > /sys/power/wakeup_count, associated with a running counter of wakeup events > > and a helper function, pm_wakeup_event(), that may be used by kernel subsystems > > to increment the wakeup events counter. > > In what way is this better than suspend blockers? It doesn't add any new framework and it doesn't require the users of pm_wakeup_event() to "unblock" suspend, so it is simpler. It also doesn't add the user space interface that caused so much opposition to appear. > > /sys/power/wakeup_count may be read from or written to by user space. Reads > > will always succeed and return the current value of the wakeup events counter. > > Writes, however, will only succeed if the written number is equal to the > > current value of the wakeup events counter. If a write is successful, it will > > cause the kernel to save the current value of the wakeup events counter and > > to compare the saved number with the current value of the counter at certain > > points of the subsequent suspend (or hibernate) sequence. If the two values > > don't match, the suspend will be aborted just as though a wakeup interrupt > > happened. Reading from /sys/power/wakeup_count again will turn that mechanism > > off. > > > > The assumption is that there's a user space power manager that will first > > read from /sys/power/wakeup_count. Then it will check all user space consumers > > of wakeup events known to it for unprocessed events. > > What happens if an event arrives just before you read > /sys/power/wakeup_count, but the userspace consumer doesn't realize > there is a new unprocessed event until after the power manager checks > it? Your plan is missing a critical step: the "handoff" whereby > responsibility for handling an event passes from the kernel to > userspace. I think this is not the kernel's problem. In this approach the kernel makes it possible for the user space to avoid the race. Whether or not the user space will use this opportunity is a different matter. > With suspend blockers, this handoff occurs when an event queue is > emptied and its associate suspend blocker is deactivated. Or with some > kinds of events for which the Android people have not written an > explicit handoff, it occurs when a timer expires (timed suspend > blockers). Well, quite frankly, I don't see any difference here. In either case there is a possibility for user space to mess up things and the kernel can't really help that. > > If there are any, it will > > wait for them to be processed and repeat. In turn, if there are not any, > > it will try to write to /sys/power/wakeup_count and if the write is successful, > > it will write to /sys/power/state to start suspend, so if any wakeup events > > accur past that point, they will be noticed by the kernel and will eventually > > cause the suspend to be aborted. > > This shares with the other alternatives posted recently the need for a > central power-manager process. And like in-kernel suspend blockers, it > requires changes to wakeup-capable drivers (the wakeup-events counter > has to be incremented). It doesn't really require changes to drivers, but to code that knows of wakeup events, like the PCI runtime wakeup code. Moreover, it doesn't require kernel subsystems to know or even care when it is reasonable to allow suspend to happen. The only thing they need to do is to call pm_wakeup_event() whenever they see a wakeup event. I don't really think it is too much of a requirement (and quite frnakly I can't imagine anything simpler than that). > One advantage of the suspend-blocker approach is that it essentially > uses a single tool to handle both kinds of races (event not fully > handled by the kernel, or event not fully handled by userspace). > Things aren't quite this simple, because of the need for a special API > to implement userspace suspend blockers, but this does avoid the need > for a power-manager process. Yes, it does, but I have an idea about how to implement such a power manager and I'm going to actually try it. > > In addition to the above, the patch adds a wakeup events counter to the > > power member of struct device and makes these per-device wakeup event counters > > available via sysfs, so that it's possible to check the activity of various > > wakeup event sources within the kernel. > > > > To illustrate how subsystems can use pm_wakeup_event(), I added it to the > > PCI runtime PM wakeup-handling code. > > > > At the moment the patch only contains code changes (ie. no documentation), > > but I'm going to add comments etc. if people like the idea. > > > > Please tell me what you think. > > While this isn't a bad idea, I don't see how it is superior to the > other alternatives that have been proposed. I don't think any of the approaches that don't use suspend blockers allows one to avoid the race between the process that writes to /sys/power/state and a wakeup event happening at the same time. They attempt to address another issue, which is how to prevent untrusted user space processes from keeping the system out of idle, but that is a different story. My patch is all about the (system-wide) suspend mechanism, regardless of whether or not it is used for opportunistic suspending. Rafael ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend 2010-06-20 21:50 ` Rafael J. Wysocki @ 2010-06-21 2:23 ` Alan Stern 2010-06-21 5:32 ` Florian Mickler ` (2 more replies) 0 siblings, 3 replies; 75+ messages in thread From: Alan Stern @ 2010-06-21 2:23 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Linux-pm mailing list, Matthew Garrett, Linux Kernel Mailing List, Dmitry Torokhov, Arve Hjønnevåg, Neil Brown, mark gross On Sun, 20 Jun 2010, Rafael J. Wysocki wrote: > > > Generally, there are two problems in that area. First, if a wakeup event > > > occurs exactly at the same time when /sys/power/state is being written to, > > > the even may be delivered to user space right before the freezing of it, > > > in which case the user space consumer of the event may not be able to process > > > it before the system is suspended. > > > > Indeed, the same problem arises if the event isn't delivered to > > userspace until after userspace is frozen. > > In that case the kernel should abort the suspend so that the event can be > delivered to the user space. Yes. > > Of course, the underlying issue here is that the kernel has no direct way > > to know when userspace has finished processing an event. Userspace would > > have to tell it, which generally would mean rewriting some large number of user > > programs. > > I'm not sure of that. If the kernel doesn't initiate suspend, it doesn't > really need to know whether or not user space has already consumed the event. That's true. But it only shifts the onus: When a userspace program has finished processing an event, it has to tell the power-manager process. Clearly this sort of thing is unavoidable, one way or another. > > > The following patch illustrates my idea of how these two problems may be > > > addressed. It introduces a new global sysfs attribute, > > > /sys/power/wakeup_count, associated with a running counter of wakeup events > > > and a helper function, pm_wakeup_event(), that may be used by kernel subsystems > > > to increment the wakeup events counter. > > > > In what way is this better than suspend blockers? > > It doesn't add any new framework and it doesn't require the users of > pm_wakeup_event() to "unblock" suspend, so it is simpler. It also doesn't add > the user space interface that caused so much opposition to appear. Okay. A quick comparison shows that in your proposal: There's no need to register and unregister suspend blockers. But instead you create the equivalent of a suspend blocker inside every struct device. Drivers (or subsystems) don't have to activate suspend blockers. But instead they have to call pm_wakeup_event(). Drivers don't have to deactivate suspend blockers. You don't have anything equivalent, and as a result your scheme is subject to the race described below. There are no userspace suspend blockers and no opportunistic suspend. Instead a power-manager process takes care of initiating or preventing suspends as needed. In short, you have eliminated the userspace part of the suspend blocker approach just as in some of the proposals posted earlier, and you have replaced the in-kernel suspend blockers with new data in struct device and a new PM API. On the whole, it doesn't seem very different from the in-kernel part of suspend blockers. The most notable difference is the name: pm_wake_event() vs. suspend_blocker_activate(), or whatever it ended up being called. This is the race I was talking about: > > What happens if an event arrives just before you read > > /sys/power/wakeup_count, but the userspace consumer doesn't realize > > there is a new unprocessed event until after the power manager checks > > it? > I think this is not the kernel's problem. In this approach the kernel makes it > possible for the user space to avoid the race. Whether or not the user space > will use this opportunity is a different matter. It is _not_ possible for userspace to avoid this race. Help from the kernel is needed. > > Your plan is missing a critical step: the "handoff" whereby > > responsibility for handling an event passes from the kernel to > > userspace. > > With suspend blockers, this handoff occurs when an event queue is > > emptied and its associate suspend blocker is deactivated. Or with some > > kinds of events for which the Android people have not written an > > explicit handoff, it occurs when a timer expires (timed suspend > > blockers). > > Well, quite frankly, I don't see any difference here. In either case there is > a possibility for user space to mess up things and the kernel can't really help > that. With suspend blockers, there is also the possibility for userspace to handle races correctly. But with your scheme there isn't -- that's the difference. > > This shares with the other alternatives posted recently the need for a > > central power-manager process. And like in-kernel suspend blockers, it > > requires changes to wakeup-capable drivers (the wakeup-events counter > > has to be incremented). > > It doesn't really require changes to drivers, but to code that knows of wakeup > events, like the PCI runtime wakeup code. Just like in-kernel suspend blockers. > Moreover, it doesn't require kernel > subsystems to know or even care when it is reasonable to allow suspend to > happen. The only thing they need to do is to call pm_wakeup_event() whenever > they see a wakeup event. That's just semantics. Obviously a wakeup event should prevent suspend from happening, so if subsystems know or care about one then they know or care about the other. > I don't really think it is too much of a requirement > (and quite frnakly I can't imagine anything simpler than that). This is because you have omitted the part about allowing suspends again (or if you prefer, about notifying the PM core that a wakeup event has been handed off to userspace). As a result of leaving this out, you haven't eliminated all the races. > Yes, it does, but I have an idea about how to implement such a power manager > and I'm going to actually try it. A logical design would be to use dbus for disseminating PM-related information. Does your idea work that way? > I don't think any of the approaches that don't use suspend blockers allows > one to avoid the race between the process that writes to /sys/power/state > and a wakeup event happening at the same time. They attempt to address another > issue, which is how to prevent untrusted user space processes from keeping the > system out of idle, but that is a different story. Well, there was one approach that didn't use suspend blockers and did solve the race: the original wakelocks proposal. Of course, that was just suspend blockers under a different name. One could make a very good case that your scheme is also suspend blockers under a different name (and with an important part missing). Alan Stern ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend 2010-06-21 2:23 ` Alan Stern @ 2010-06-21 5:32 ` Florian Mickler 2010-06-21 15:23 ` Alan Stern 2010-06-21 16:54 ` Alan Stern 2010-06-21 6:13 ` mark gross 2010-06-21 21:58 ` Rafael J. Wysocki 2 siblings, 2 replies; 75+ messages in thread From: Florian Mickler @ 2010-06-21 5:32 UTC (permalink / raw) To: Alan Stern Cc: Rafael J. Wysocki, Linux-pm mailing list, Matthew Garrett, Linux Kernel Mailing List, Dmitry Torokhov, Arve Hjønnevåg, Neil Brown, mark gross On Sun, 20 Jun 2010 22:23:38 -0400 (EDT) Alan Stern <stern@rowland.harvard.edu> wrote: > On Sun, 20 Jun 2010, Rafael J. Wysocki wrote: > > > In what way is this better than suspend blockers? > > > > It doesn't add any new framework and it doesn't require the users of > > pm_wakeup_event() to "unblock" suspend, so it is simpler. It also doesn't add > > the user space interface that caused so much opposition to appear. > > Okay. A quick comparison shows that in your proposal: > > There's no need to register and unregister suspend blockers. > But instead you create the equivalent of a suspend blocker > inside every struct device. > > Drivers (or subsystems) don't have to activate suspend > blockers. But instead they have to call pm_wakeup_event(). > > Drivers don't have to deactivate suspend blockers. You don't > have anything equivalent, and as a result your scheme is > subject to the race described below. > > There are no userspace suspend blockers and no opportunistic > suspend. Instead a power-manager process takes care of > initiating or preventing suspends as needed. > > In short, you have eliminated the userspace part of the suspend blocker > approach just as in some of the proposals posted earlier, and you have > replaced the in-kernel suspend blockers with new data in struct device > and a new PM API. On the whole, it doesn't seem very different from > the in-kernel part of suspend blockers. The most notable difference is > the name: pm_wake_event() vs. suspend_blocker_activate(), or whatever > it ended up being called. > > This is the race I was talking about: > > > > What happens if an event arrives just before you read > > > /sys/power/wakeup_count, but the userspace consumer doesn't realize > > > there is a new unprocessed event until after the power manager checks > > > it? > > > I think this is not the kernel's problem. In this approach the kernel makes it > > possible for the user space to avoid the race. Whether or not the user space > > will use this opportunity is a different matter. > > It is _not_ possible for userspace to avoid this race. Help from the > kernel is needed. It is possible if every (relevant) userspace program implements a callback for the powermanager to check if one of it's wakeup-sources got activated. That way the powermanager would read /sys/power/wakeup_count, then do the roundtrip to all it's registered users and only then suspend. This turns the suspend_blockers concept around. Instead of actively signaling the suspend_blockers, the userspace programs only answer "yes/no" when asked. (i.e. polling?) You _can not_ implement userspace suspend blockers with this approach, as it is vital for every userspace program to get scheduled and check it's wakeup-source (if even possible) before you know that the right parties have won the race. Cheers, Flo ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend 2010-06-21 5:32 ` Florian Mickler @ 2010-06-21 15:23 ` Alan Stern 2010-06-21 20:38 ` Florian Mickler 2010-06-21 16:54 ` Alan Stern 1 sibling, 1 reply; 75+ messages in thread From: Alan Stern @ 2010-06-21 15:23 UTC (permalink / raw) To: Florian Mickler Cc: Rafael J. Wysocki, Linux-pm mailing list, Matthew Garrett, Linux Kernel Mailing List, Dmitry Torokhov, Arve Hjønnevåg, Neil Brown, mark gross On Mon, 21 Jun 2010, Florian Mickler wrote: > On Sun, 20 Jun 2010 22:23:38 -0400 (EDT) > Alan Stern <stern@rowland.harvard.edu> wrote: > > This is the race I was talking about: > > > > > > What happens if an event arrives just before you read > > > > /sys/power/wakeup_count, but the userspace consumer doesn't realize > > > > there is a new unprocessed event until after the power manager checks > > > > it? > > > > > I think this is not the kernel's problem. In this approach the kernel makes it > > > possible for the user space to avoid the race. Whether or not the user space > > > will use this opportunity is a different matter. > > > > It is _not_ possible for userspace to avoid this race. Help from the > > kernel is needed. > > It is possible if every (relevant) userspace program implements a > callback for the powermanager to check if one of it's wakeup-sources > got activated. > > That way the powermanager would read /sys/power/wakeup_count, then do > the roundtrip to all it's registered users and only then suspend. > > This turns the suspend_blockers concept around. Instead of actively > signaling the suspend_blockers, the userspace programs only answer > "yes/no" when asked. (i.e. polling?) In the end you would want to have communication in both directions: suspend blockers _and_ callbacks. Polling is bad if done too often. But I think the idea is a good one. In fact, you don't need a "yes/no" response. Programs merely need a chance to activate a new suspend blocker if a wakeup source was recently activated before they acknowledge the poll. > You _can not_ implement userspace suspend blockers with this approach, > as it is vital for every userspace program to get scheduled and check > it's wakeup-source (if even possible) before you know that the right > parties have won the race. I'm not sure what you mean. Certainly you can take a userspace suspend-blocker implementation of the sort discussed before (where programs communicate their needs to a central power-manager process) and add this callback mechanism on top. There is still at least one loophole to be closed: Android's timer-based wakelocks. These include cases where the Android developers didn't add enough wakelocks to cover the entire path from kernel-event to userspace-handler, so they punted and relied on a timer to decide when the wakelock should be deactivated. (There may be other cases too; I didn't follow the original discussion very closely.) It's not clear whether these things can be handled already in Rafael's scheme with your addition, or whether something new is needed. Alan Stern ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend 2010-06-21 15:23 ` Alan Stern @ 2010-06-21 20:38 ` Florian Mickler 2010-06-21 22:18 ` Alan Stern 0 siblings, 1 reply; 75+ messages in thread From: Florian Mickler @ 2010-06-21 20:38 UTC (permalink / raw) To: Alan Stern Cc: Rafael J. Wysocki, Linux-pm mailing list, Matthew Garrett, Linux Kernel Mailing List, Dmitry Torokhov, Arve Hjønnevåg, Neil Brown, mark gross On Mon, 21 Jun 2010 11:23:33 -0400 (EDT) Alan Stern <stern@rowland.harvard.edu> wrote: > On Mon, 21 Jun 2010, Florian Mickler wrote: > > > On Sun, 20 Jun 2010 22:23:38 -0400 (EDT) > > Alan Stern <stern@rowland.harvard.edu> wrote: > > > > This is the race I was talking about: > > > > > > > > What happens if an event arrives just before you read > > > > > /sys/power/wakeup_count, but the userspace consumer doesn't realize > > > > > there is a new unprocessed event until after the power manager checks > > > > > it? > > > > > > > I think this is not the kernel's problem. In this approach the kernel makes it > > > > possible for the user space to avoid the race. Whether or not the user space > > > > will use this opportunity is a different matter. > > > > > > It is _not_ possible for userspace to avoid this race. Help from the > > > kernel is needed. > > > > It is possible if every (relevant) userspace program implements a > > callback for the powermanager to check if one of it's wakeup-sources > > got activated. > > > > That way the powermanager would read /sys/power/wakeup_count, then do > > the roundtrip to all it's registered users and only then suspend. > > > > This turns the suspend_blockers concept around. Instead of actively > > signaling the suspend_blockers, the userspace programs only answer > > "yes/no" when asked. (i.e. polling?) > > In the end you would want to have communication in both directions: > suspend blockers _and_ callbacks. Polling is bad if done too often. > But I think the idea is a good one. Actually, I'm not so shure. 1. you have to roundtrip whereas in the suspend_blocker scheme you have active annotations (i.e. no further action needed) 2. it may not be possible for a user to determine if a wake-event is in-flight. you would have to somehow pass the wake-event-number with it, so that the userspace process could ack it properly without confusion. Or... I don't know of anything else... 1. userspace-manager (UM) reads a number (42). 2. it questions userspace program X: is it ok to suspend? [please fill in how userspace program X determines to block suspend] 3a. UM's roundtrip ends and it proceeds to write "42" to the kernel [suspending] 3b. UM's roundtrip ends and it aborts suspend, because a (userspace-)suspend-blocker got activated I'm not shure how the userspace program could determine that there is a wake-event in flight. Perhaps by storing the number of last wake-event. But then you need per-wake-event-counters... :| > In fact, you don't need a "yes/no" response. Programs merely need a > chance to activate a new suspend blocker if a wakeup source was > recently activated before they acknowledge the poll. true. (incorporated alreeady above) > > > You _can not_ implement userspace suspend blockers with this approach, > > as it is vital for every userspace program to get scheduled and check > > it's wakeup-source (if even possible) before you know that the right > > parties have won the race. > > I'm not sure what you mean. Sorry, that was not understandable. What I meant was that you "_can not_" implement the suspend-blockers scheme, where you don't need to roundtrip through all userspace (with all it's glory). ( => you need the roundtrip here) > > There is still at least one loophole to be closed: Android's > timer-based wakelocks. These include cases where the Android > developers didn't add enough wakelocks to cover the entire path from > kernel-event to userspace-handler, so they punted and relied on a timer > to decide when the wakelock should be deactivated. (There may be other > cases too; I didn't follow the original discussion very closely.) > It's not clear whether these things can be handled already in Rafael's > scheme with your addition, or whether something new is needed. > > Alan Stern Do you have some thoughts about the wake-event-in-flight detection? Cheers, Flo ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend 2010-06-21 20:38 ` Florian Mickler @ 2010-06-21 22:18 ` Alan Stern 2010-06-21 22:40 ` Rafael J. Wysocki 0 siblings, 1 reply; 75+ messages in thread From: Alan Stern @ 2010-06-21 22:18 UTC (permalink / raw) To: Florian Mickler Cc: Rafael J. Wysocki, Linux-pm mailing list, Matthew Garrett, Linux Kernel Mailing List, Dmitry Torokhov, Arve Hjønnevåg, Neil Brown, mark gross On Mon, 21 Jun 2010, Florian Mickler wrote: > > In the end you would want to have communication in both directions: > > suspend blockers _and_ callbacks. Polling is bad if done too often. > > But I think the idea is a good one. > > Actually, I'm not so shure. > > 1. you have to roundtrip whereas in the suspend_blocker scheme you have > active annotations (i.e. no further action needed) That's why it's best to use both. The normal case is that programs activate and deactivate blockers by sending one-way messages to the PM process. The exceptional case is when the PM process is about to initiate a suspend; that's when it does the round-trip polling. Since the only purpose of the polling is to avoid a race, 90% of the time it will succeed. > 2. it may not be possible for a user to determine if a wake-event is > in-flight. you would have to somehow pass the wake-event-number with > it, so that the userspace process could ack it properly without > confusion. Or... I don't know of anything else... > > 1. userspace-manager (UM) reads a number (42). > > 2. it questions userspace program X: is it ok to suspend? > > [please fill in how userspace program X determines to block > suspend] > > 3a. UM's roundtrip ends and it proceeds to write "42" to the > kernel [suspending] > 3b. UM's roundtrip ends and it aborts suspend, because a > (userspace-)suspend-blocker got activated > > I'm not shure how the userspace program could determine that there is a > wake-event in flight. Perhaps by storing the number of last wake-event. > But then you need per-wake-event-counters... :| Rafael seems to think timeouts will fix this. I'm not so sure. > Do you have some thoughts about the wake-event-in-flight detection? Not really, except for something like the original wakelock scheme in which the kernel tells the PM core when an event is over. Alan Stern ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend 2010-06-21 22:18 ` Alan Stern @ 2010-06-21 22:40 ` Rafael J. Wysocki 2010-06-21 22:48 ` Rafael J. Wysocki ` (2 more replies) 0 siblings, 3 replies; 75+ messages in thread From: Rafael J. Wysocki @ 2010-06-21 22:40 UTC (permalink / raw) To: Alan Stern Cc: Florian Mickler, Linux-pm mailing list, Matthew Garrett, Linux Kernel Mailing List, Dmitry Torokhov, Arve Hjønnevåg, Neil Brown, mark gross On Tuesday, June 22, 2010, Alan Stern wrote: > On Mon, 21 Jun 2010, Florian Mickler wrote: > > > > In the end you would want to have communication in both directions: > > > suspend blockers _and_ callbacks. Polling is bad if done too often. > > > But I think the idea is a good one. > > > > Actually, I'm not so shure. > > > > 1. you have to roundtrip whereas in the suspend_blocker scheme you have > > active annotations (i.e. no further action needed) > > That's why it's best to use both. The normal case is that programs > activate and deactivate blockers by sending one-way messages to the PM > process. The exceptional case is when the PM process is about to > initiate a suspend; that's when it does the round-trip polling. Since > the only purpose of the polling is to avoid a race, 90% of the time it > will succeed. > > > 2. it may not be possible for a user to determine if a wake-event is > > in-flight. you would have to somehow pass the wake-event-number with > > it, so that the userspace process could ack it properly without > > confusion. Or... I don't know of anything else... > > > > 1. userspace-manager (UM) reads a number (42). > > > > 2. it questions userspace program X: is it ok to suspend? > > > > [please fill in how userspace program X determines to block > > suspend] > > > > 3a. UM's roundtrip ends and it proceeds to write "42" to the > > kernel [suspending] > > 3b. UM's roundtrip ends and it aborts suspend, because a > > (userspace-)suspend-blocker got activated > > > > I'm not shure how the userspace program could determine that there is a > > wake-event in flight. Perhaps by storing the number of last wake-event. > > But then you need per-wake-event-counters... :| > > Rafael seems to think timeouts will fix this. I'm not so sure. > > > Do you have some thoughts about the wake-event-in-flight detection? > > Not really, except for something like the original wakelock scheme in > which the kernel tells the PM core when an event is over. But the kernel doesn't really know that, so it really can't tell the PM core anything useful. What happens with suspend blockers is that a kernel suspend cooperates with a user space consumer of the event to get the story straight. However, that will only work if the user space is not buggy and doesn't crash, for example, before releasing the suspend blocker it's holding. Apart from this, there are those events withoug user space "handoff" that use timeouts. Also there are events like wake-on-LAN that can be regarded as instantaneous from the power manager's point of view, so they don't really need all of the "suspend blockers" machinery and for them we will need to use a cooldown timeout anyway. And if we need to use that cooldown timeout, I don't see why not to use timeouts for avoiding the race you're worrying about. Rafael ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend 2010-06-21 22:40 ` Rafael J. Wysocki @ 2010-06-21 22:48 ` Rafael J. Wysocki 2010-06-22 0:50 ` Arve Hjønnevåg 2010-06-22 10:21 ` Rafael J. Wysocki 2 siblings, 0 replies; 75+ messages in thread From: Rafael J. Wysocki @ 2010-06-21 22:48 UTC (permalink / raw) To: Alan Stern Cc: Florian Mickler, Linux-pm mailing list, Matthew Garrett, Linux Kernel Mailing List, Dmitry Torokhov, Arve Hjønnevåg, Neil Brown, mark gross On Tuesday, June 22, 2010, Rafael J. Wysocki wrote: > On Tuesday, June 22, 2010, Alan Stern wrote: > > On Mon, 21 Jun 2010, Florian Mickler wrote: > > > > > > In the end you would want to have communication in both directions: > > > > suspend blockers _and_ callbacks. Polling is bad if done too often. > > > > But I think the idea is a good one. > > > > > > Actually, I'm not so shure. > > > > > > 1. you have to roundtrip whereas in the suspend_blocker scheme you have > > > active annotations (i.e. no further action needed) > > > > That's why it's best to use both. The normal case is that programs > > activate and deactivate blockers by sending one-way messages to the PM > > process. The exceptional case is when the PM process is about to > > initiate a suspend; that's when it does the round-trip polling. Since > > the only purpose of the polling is to avoid a race, 90% of the time it > > will succeed. > > > > > 2. it may not be possible for a user to determine if a wake-event is > > > in-flight. you would have to somehow pass the wake-event-number with > > > it, so that the userspace process could ack it properly without > > > confusion. Or... I don't know of anything else... > > > > > > 1. userspace-manager (UM) reads a number (42). > > > > > > 2. it questions userspace program X: is it ok to suspend? > > > > > > [please fill in how userspace program X determines to block > > > suspend] > > > > > > 3a. UM's roundtrip ends and it proceeds to write "42" to the > > > kernel [suspending] > > > 3b. UM's roundtrip ends and it aborts suspend, because a > > > (userspace-)suspend-blocker got activated > > > > > > I'm not shure how the userspace program could determine that there is a > > > wake-event in flight. Perhaps by storing the number of last wake-event. > > > But then you need per-wake-event-counters... :| > > > > Rafael seems to think timeouts will fix this. I'm not so sure. > > > > > Do you have some thoughts about the wake-event-in-flight detection? > > > > Not really, except for something like the original wakelock scheme in > > which the kernel tells the PM core when an event is over. > > But the kernel doesn't really know that, so it really can't tell the PM core > anything useful. What happens with suspend blockers is that a kernel suspend s/suspend/subsyste/ (-ETOOLATE) > cooperates with a user space consumer of the event to get the story straight. > > However, that will only work if the user space is not buggy and doesn't crash, > for example, before releasing the suspend blocker it's holding. > > Apart from this, there are those events withoug user space "handoff" that > use timeouts. > > Also there are events like wake-on-LAN that can be regarded as instantaneous > from the power manager's point of view, so they don't really need all of the > "suspend blockers" machinery and for them we will need to use a cooldown > timeout anyway. > > And if we need to use that cooldown timeout, I don't see why not to use > timeouts for avoiding the race you're worrying about. ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend 2010-06-21 22:40 ` Rafael J. Wysocki 2010-06-21 22:48 ` Rafael J. Wysocki @ 2010-06-22 0:50 ` Arve Hjønnevåg 2010-06-22 10:21 ` Rafael J. Wysocki 2 siblings, 0 replies; 75+ messages in thread From: Arve Hjønnevåg @ 2010-06-22 0:50 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Alan Stern, Florian Mickler, Linux-pm mailing list, Matthew Garrett, Linux Kernel Mailing List, Dmitry Torokhov, Neil Brown, mark gross On Mon, Jun 21, 2010 at 3:40 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote: > On Tuesday, June 22, 2010, Alan Stern wrote: >> On Mon, 21 Jun 2010, Florian Mickler wrote: >> >> > > In the end you would want to have communication in both directions: >> > > suspend blockers _and_ callbacks. Polling is bad if done too often. >> > > But I think the idea is a good one. >> > >> > Actually, I'm not so shure. >> > >> > 1. you have to roundtrip whereas in the suspend_blocker scheme you have >> > active annotations (i.e. no further action needed) >> >> That's why it's best to use both. The normal case is that programs >> activate and deactivate blockers by sending one-way messages to the PM >> process. The exceptional case is when the PM process is about to >> initiate a suspend; that's when it does the round-trip polling. Since >> the only purpose of the polling is to avoid a race, 90% of the time it >> will succeed. >> >> > 2. it may not be possible for a user to determine if a wake-event is >> > in-flight. you would have to somehow pass the wake-event-number with >> > it, so that the userspace process could ack it properly without >> > confusion. Or... I don't know of anything else... >> > >> > 1. userspace-manager (UM) reads a number (42). >> > >> > 2. it questions userspace program X: is it ok to suspend? >> > >> > [please fill in how userspace program X determines to block >> > suspend] >> > >> > 3a. UM's roundtrip ends and it proceeds to write "42" to the >> > kernel [suspending] >> > 3b. UM's roundtrip ends and it aborts suspend, because a >> > (userspace-)suspend-blocker got activated >> > >> > I'm not shure how the userspace program could determine that there is a >> > wake-event in flight. Perhaps by storing the number of last wake-event. >> > But then you need per-wake-event-counters... :| >> >> Rafael seems to think timeouts will fix this. I'm not so sure. >> >> > Do you have some thoughts about the wake-event-in-flight detection? >> >> Not really, except for something like the original wakelock scheme in >> which the kernel tells the PM core when an event is over. > > But the kernel doesn't really know that, so it really can't tell the PM core > anything useful. What happens with suspend blockers is that a kernel suspend > cooperates with a user space consumer of the event to get the story straight. > > However, that will only work if the user space is not buggy and doesn't crash, > for example, before releasing the suspend blocker it's holding. > > Apart from this, there are those events withoug user space "handoff" that > use timeouts. > > Also there are events like wake-on-LAN that can be regarded as instantaneous > from the power manager's point of view, so they don't really need all of the > "suspend blockers" machinery and for them we will need to use a cooldown > timeout anyway. > > And if we need to use that cooldown timeout, I don't see why not to use > timeouts for avoiding the race you're worrying about. > Just because some events need to use timeouts, does not mean that all events should use timeouts. You may even want different timeout for different events. If I want a 5 minute timeout after a wake-on-lan packet, I don't want the same timeout after every battery check alarm (every 10 minutes on the Nexus One). Also, I don't see how this patch helps with events that never make reach user-space. Unless each driver blocks suspend by returning and error from its suspend hook, the driver then becomes dependent on the user-space power manager to choose a sufficiently long timeout. For some drivers, like the gpio keypad matrix driver there, is no safe value for the timeout. The keypad driver has to block suspend if any keys are held down. I won't have time to actively follow this discussion, but I don't think this patch is a good solution. -- Arve Hjønnevåg ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend 2010-06-21 22:40 ` Rafael J. Wysocki 2010-06-21 22:48 ` Rafael J. Wysocki 2010-06-22 0:50 ` Arve Hjønnevåg @ 2010-06-22 10:21 ` Rafael J. Wysocki 2010-06-22 14:35 ` Alan Stern 2010-06-22 23:00 ` mark gross 2 siblings, 2 replies; 75+ messages in thread From: Rafael J. Wysocki @ 2010-06-22 10:21 UTC (permalink / raw) To: Alan Stern Cc: Florian Mickler, Linux-pm mailing list, Matthew Garrett, Linux Kernel Mailing List, Dmitry Torokhov, Arve Hjønnevåg, Neil Brown, mark gross On Tuesday, June 22, 2010, Rafael J. Wysocki wrote: > On Tuesday, June 22, 2010, Alan Stern wrote: > > On Mon, 21 Jun 2010, Florian Mickler wrote: > > > > > > In the end you would want to have communication in both directions: > > > > suspend blockers _and_ callbacks. Polling is bad if done too often. > > > > But I think the idea is a good one. > > > > > > Actually, I'm not so shure. > > > > > > 1. you have to roundtrip whereas in the suspend_blocker scheme you have > > > active annotations (i.e. no further action needed) > > > > That's why it's best to use both. The normal case is that programs > > activate and deactivate blockers by sending one-way messages to the PM > > process. The exceptional case is when the PM process is about to > > initiate a suspend; that's when it does the round-trip polling. Since > > the only purpose of the polling is to avoid a race, 90% of the time it > > will succeed. > > > > > 2. it may not be possible for a user to determine if a wake-event is > > > in-flight. you would have to somehow pass the wake-event-number with > > > it, so that the userspace process could ack it properly without > > > confusion. Or... I don't know of anything else... > > > > > > 1. userspace-manager (UM) reads a number (42). > > > > > > 2. it questions userspace program X: is it ok to suspend? > > > > > > [please fill in how userspace program X determines to block > > > suspend] > > > > > > 3a. UM's roundtrip ends and it proceeds to write "42" to the > > > kernel [suspending] > > > 3b. UM's roundtrip ends and it aborts suspend, because a > > > (userspace-)suspend-blocker got activated > > > > > > I'm not shure how the userspace program could determine that there is a > > > wake-event in flight. Perhaps by storing the number of last wake-event. > > > But then you need per-wake-event-counters... :| > > > > Rafael seems to think timeouts will fix this. I'm not so sure. > > > > > Do you have some thoughts about the wake-event-in-flight detection? > > > > Not really, except for something like the original wakelock scheme in > > which the kernel tells the PM core when an event is over. > > But the kernel doesn't really know that, so it really can't tell the PM core > anything useful. What happens with suspend blockers is that a kernel subsystem > cooperates with a user space consumer of the event to get the story straight. > > However, that will only work if the user space is not buggy and doesn't crash, > for example, before releasing the suspend blocker it's holding. Having reconsidered that I think there's more to it. Take the PCI subsystem as an example, specifically pcie_pme_handle_request(). This is the place where wakeup events are started, but it has no idea about how they are going to be handled. Thus in the suspend blocker scheme it would need to activate a blocker, but it wouldn't be able to release it. So, it seems, we would need to associate a suspend blocker with each PCIe device that can generate wakeup events and require all drivers of such devices to deal with a blocker activated by someone else (PCIe PME driver in this particular case). That sounds cumbersome to say the least. Moreover, even if we do that, it still doesn't solve the entire problem, because the event may need to be delivered to user space and processed by it. While a driver can check if user space has already read the event, it has no way to detect whether or not it has finished processing it. In fact, processing an event may involve an interaction with a (human) user and there's no general way by which software can figure out when the user considers the event as processed. It looks like user space suspend blockers only help in some special cases when the user space processing of a wakeup event is simple enough, but I don't think they help in general. For an extreme example, a user may want to wake up a system using wake-on-LAN to log into it, do some work and log out, so effectively the initial wakeup event has not been processed entirely until the user finally logs out of the system. Now, after the system wakeup (resulting from the wake-on-LAN signal) we need to give the user some time to log in, but if the user doesn't do that in certain time, it may be reasonable to suspend and let the user wake up the system again. Similar situation takes place when the system is woken up by a lid switch. Evidently, the user has opened the laptop lid to do something, but we don't even know what the user is going to do, so there's no way we can say when the wakeup event is finally processed. So, even if we can say when the kernel has finished processing the event (although that would be complicated in the PCIe case above), I don't think it's generally possible to ensure that the entire processing of a wakeup event has been completed. This leads to the question whether or not it is worth trying to detect the ending of the processing of a wakeup event. Now, going back to the $subject patch, I didn't really think it would be suitable for opportunistic suspend, so let's focus on the "forced" suspend instead. It still has the problem that wakeup events occuring while /sys/power/state is written to (or even slightly before) should cause the system to cancel the suspend, but they generally won't. With the patch applied that can be avoided by (a) reading from /sys/power/wakeup_count, (b) waiting for certain time (such that if a suspend event is not entirely processed within that time, it's worth suspending and waking up the system again) and (c) writing to /sys/power/wakeup_count right before writing to /sys/power/state (where the latter is only done if the former succeeds). Rafael ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend 2010-06-22 10:21 ` Rafael J. Wysocki @ 2010-06-22 14:35 ` Alan Stern 2010-06-22 15:35 ` Rafael J. Wysocki 2010-06-22 23:00 ` mark gross 1 sibling, 1 reply; 75+ messages in thread From: Alan Stern @ 2010-06-22 14:35 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Florian Mickler, Linux-pm mailing list, Matthew Garrett, Linux Kernel Mailing List, Dmitry Torokhov, Arve Hjønnevåg, Neil Brown, mark gross On Tue, 22 Jun 2010, Rafael J. Wysocki wrote: > Having reconsidered that I think there's more to it. > > Take the PCI subsystem as an example, specifically pcie_pme_handle_request(). > This is the place where wakeup events are started, but it has no idea about > how they are going to be handled. Thus in the suspend blocker scheme it would > need to activate a blocker, but it wouldn't be able to release it. So, it > seems, we would need to associate a suspend blocker with each PCIe device > that can generate wakeup events and require all drivers of such devices to > deal with a blocker activated by someone else (PCIe PME driver in this > particular case). That sounds cumbersome to say the least. Maybe this means pcie_pme_handle_request() isn't a good place to note the arrival of a wakeup event. Doing it in the individual driver might work out better. Or look at this from a different point of view. Adopting Mark's terminology, let's say that the kernel is in a "critical section" at times when we don't want the system to suspend. Currently there's no way to tell the PM core when the kernel enters or leaves a critical section, so there's no way to prevent the system from suspending at the wrong time. Most wakeup events indicate the start of a critical section, in the sense that you hardly ever say: "I want the computer to wake up when I press this button, but I don't care what it does afterward -- it can go right back to sleep without doing anything if it wants." Much more common is that a wakeup event requires a certain amount of processing, and you don't want the system to go back to sleep until the processing is over. Of course, if the processing is simple enough that it can all be done in an interrupt handler or a resume method, then nothing extra is needed since obviously the system won't suspend while an interrupt handler or a resume method is running. But for more complicated cases, we need to do more. The problem in your example is that pcie_pme_handle_request() has no idea about the nature or extent of the critical section to follow. Therefore it's not in a good position to mark the beginning of the critical section, even though it is in an excellent position to mark the receipt of a wakeup event. > Moreover, even if we do that, it still doesn't solve the entire problem, > because the event may need to be delivered to user space and processed by it. > While a driver can check if user space has already read the event, it has > no way to detect whether or not it has finished processing it. In fact, > processing an event may involve an interaction with a (human) user and there's > no general way by which software can figure out when the user considers the > event as processed. Is this the kernel's problem? Once userspace has read the event, we can safely say that the kernel's critical section is over. Perhaps a userspace critical section has begun, perhaps not; either way, it's no longer the kernel's responsibility. > It looks like user space suspend blockers only help in some special cases > when the user space processing of a wakeup event is simple enough, but I don't > think they help in general. For an extreme example, a user may want to wake up > a system using wake-on-LAN to log into it, do some work and log out, so > effectively the initial wakeup event has not been processed entirely until the > user finally logs out of the system. Now, after the system wakeup (resulting > from the wake-on-LAN signal) we need to give the user some time to log in, but > if the user doesn't do that in certain time, it may be reasonable to suspend > and let the user wake up the system again. I agree. This is a case where there is no clear-cut end to the kernel's critical section, because the event is not handed over to userspace. A reasonable approach would be to use a timeout, perhaps also with some heuristic like: End the critical section early if we receive 100(?) more network packets before the timeout expires. > Similar situation takes place when the system is woken up by a lid switch. > Evidently, the user has opened the laptop lid to do something, but we don't > even know what the user is going to do, so there's no way we can say when > the wakeup event is finally processed. In this case, the kernel could inform an appropriate user process (some part of DeviceKit? or the power-manager process?) about the lid-switch event. Once that information had been passed on, the kernel's critical section would be over. The process could start its own critical section or not, as it sees fit. If there is no process to send the information to, the kernel could again end the critical section after a reasonable timeout (1 minute?). > So, even if we can say when the kernel has finished processing the event > (although that would be complicated in the PCIe case above), I don't think > it's generally possible to ensure that the entire processing of a wakeup event > has been completed. This leads to the question whether or not it is worth > trying to detect the ending of the processing of a wakeup event. As Arve pointed out, in some cases it definitely is worthwhile (the gpio keypad matrix example). In other cases there may be no reasonable way to tell. That doesn't mean we have to give up entirely. > Now, going back to the $subject patch, I didn't really think it would be > suitable for opportunistic suspend, so let's focus on the "forced" suspend > instead. It still has the problem that wakeup events occuring while > /sys/power/state is written to (or even slightly before) should cause the > system to cancel the suspend, but they generally won't. With the patch > applied that can be avoided by (a) reading from /sys/power/wakeup_count, > (b) waiting for certain time (such that if a suspend event is not entirely > processed within that time, it's worth suspending and waking up the > system again) and (c) writing to /sys/power/wakeup_count right before writing > to /sys/power/state (where the latter is only done if the former succeeds). In other words, you could detect if a critical section begins after the user process has decided to initiate a suspend. Yes, that's so. On the other hand, we should already be able to abort a suspend if a wakeup event arrives after tasks are frozen (to pick a reasonable boundary point). If we can't -- if some wakeup events are able to slip through our fingers -- I would say it's a bug and the relevant drivers need to be fixed. In the end this probably will require adding a function to notify the PM core that a wakeup event occurred and therefore a suspend-in-progress should be aborted -- almost exactly what pm_wakeup_event() does. So I'm not opposed to the new function. But it doesn't solve the entire problem of avoiding suspends during critical sections. Alan Stern ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend 2010-06-22 14:35 ` Alan Stern @ 2010-06-22 15:35 ` Rafael J. Wysocki 2010-06-22 19:55 ` Alan Stern ` (2 more replies) 0 siblings, 3 replies; 75+ messages in thread From: Rafael J. Wysocki @ 2010-06-22 15:35 UTC (permalink / raw) To: Alan Stern Cc: Florian Mickler, Linux-pm mailing list, Matthew Garrett, Linux Kernel Mailing List, Dmitry Torokhov, Arve Hjønnevåg, Neil Brown, mark gross On Tuesday, June 22, 2010, Alan Stern wrote: > On Tue, 22 Jun 2010, Rafael J. Wysocki wrote: > > > Having reconsidered that I think there's more to it. > > > > Take the PCI subsystem as an example, specifically pcie_pme_handle_request(). > > This is the place where wakeup events are started, but it has no idea about > > how they are going to be handled. Thus in the suspend blocker scheme it would > > need to activate a blocker, but it wouldn't be able to release it. So, it > > seems, we would need to associate a suspend blocker with each PCIe device > > that can generate wakeup events and require all drivers of such devices to > > deal with a blocker activated by someone else (PCIe PME driver in this > > particular case). That sounds cumbersome to say the least. > > Maybe this means pcie_pme_handle_request() isn't a good place to note > the arrival of a wakeup event. Doing it in the individual driver might > work out better. But it this case there will be an open window in which suspend may be started after the wakeup event is signaled, but before the PM core is told about it. > Or look at this from a different point of view. Adopting Mark's > terminology, let's say that the kernel is in a "critical section" at > times when we don't want the system to suspend. Currently there's no > way to tell the PM core when the kernel enters or leaves a critical > section, so there's no way to prevent the system from suspending at the > wrong time. > > Most wakeup events indicate the start of a critical section, in the > sense that you hardly ever say: "I want the computer to wake up when I > press this button, but I don't care what it does afterward -- it can go > right back to sleep without doing anything if it wants." Much more > common is that a wakeup event requires a certain amount of processing, > and you don't want the system to go back to sleep until the processing > is over. Of course, if the processing is simple enough that it can all > be done in an interrupt handler or a resume method, then nothing > extra is needed since obviously the system won't suspend while an > interrupt handler or a resume method is running. But for more > complicated cases, we need to do more. > > The problem in your example is that pcie_pme_handle_request() has no > idea about the nature or extent of the critical section to follow. Exactly. > Therefore it's not in a good position to mark the beginning of the > critical section, even though it is in an excellent position to mark > the receipt of a wakeup event. I think we have no choice but to regard the detection of a wakeup event as the beginning of a "suspend critical section". > > Moreover, even if we do that, it still doesn't solve the entire problem, > > because the event may need to be delivered to user space and processed by it. > > While a driver can check if user space has already read the event, it has > > no way to detect whether or not it has finished processing it. In fact, > > processing an event may involve an interaction with a (human) user and there's > > no general way by which software can figure out when the user considers the > > event as processed. > > Is this the kernel's problem? Once userspace has read the event, we > can safely say that the kernel's critical section is over. Perhaps a > userspace critical section has begun, perhaps not; either way, it's no > longer the kernel's responsibility. Well, I agree here, but in the suspend blockers world it is the kernel responsibility, because the kernel contains the power manager part. > > It looks like user space suspend blockers only help in some special cases > > when the user space processing of a wakeup event is simple enough, but I don't > > think they help in general. For an extreme example, a user may want to wake up > > a system using wake-on-LAN to log into it, do some work and log out, so > > effectively the initial wakeup event has not been processed entirely until the > > user finally logs out of the system. Now, after the system wakeup (resulting > > from the wake-on-LAN signal) we need to give the user some time to log in, but > > if the user doesn't do that in certain time, it may be reasonable to suspend > > and let the user wake up the system again. > > I agree. This is a case where there is no clear-cut end to the > kernel's critical section, because the event is not handed over to > userspace. A reasonable approach would be to use a timeout, perhaps > also with some heuristic like: End the critical section early if we > receive 100(?) more network packets before the timeout expires. Exactly. > > Similar situation takes place when the system is woken up by a lid switch. > > Evidently, the user has opened the laptop lid to do something, but we don't > > even know what the user is going to do, so there's no way we can say when > > the wakeup event is finally processed. > > In this case, the kernel could inform an appropriate user process (some > part of DeviceKit? or the power-manager process?) about the lid-switch > event. Once that information had been passed on, the kernel's critical > section would be over. The process could start its own critical > section or not, as it sees fit. > > If there is no process to send the information to, the kernel could > again end the critical section after a reasonable timeout (1 minute?). Agreed. > > So, even if we can say when the kernel has finished processing the event > > (although that would be complicated in the PCIe case above), I don't think > > it's generally possible to ensure that the entire processing of a wakeup event > > has been completed. This leads to the question whether or not it is worth > > trying to detect the ending of the processing of a wakeup event. > > As Arve pointed out, in some cases it definitely is worthwhile (the > gpio keypad matrix example). In other cases there may be no reasonable > way to tell. That doesn't mean we have to give up entirely. Well, I'm not sure, because that really depends on the hardware and bus in question. The necessary condition seems to be that the event be detected and handled entirely by the same functional unit (eg. a device driver) within the kernel and such that it is able to detect whether or not user space has acquired the event information. That doesn't seem to be a common case to me. > > Now, going back to the $subject patch, I didn't really think it would be > > suitable for opportunistic suspend, so let's focus on the "forced" suspend > > instead. It still has the problem that wakeup events occuring while > > /sys/power/state is written to (or even slightly before) should cause the > > system to cancel the suspend, but they generally won't. With the patch > > applied that can be avoided by (a) reading from /sys/power/wakeup_count, > > (b) waiting for certain time (such that if a suspend event is not entirely > > processed within that time, it's worth suspending and waking up the > > system again) and (c) writing to /sys/power/wakeup_count right before writing > > to /sys/power/state (where the latter is only done if the former succeeds). > > In other words, you could detect if a critical section begins after the > user process has decided to initiate a suspend. Yes, that's so. Generally yes, although I think it will also detect "critical sections" starting exactly at the moment the suspend is started. Which in fact is the purpose of the patch. > On the other hand, we should already be able to abort a suspend if a > wakeup event arrives after tasks are frozen (to pick a reasonable > boundary point). If we can't -- if some wakeup events are able to slip > through our fingers -- I would say it's a bug and the relevant drivers > need to be fixed. In the end this probably will require adding a > function to notify the PM core that a wakeup event occurred and > therefore a suspend-in-progress should be aborted -- almost exactly > what pm_wakeup_event() does. That's correct. > So I'm not opposed to the new function. But it doesn't solve the > entire problem of avoiding suspends during critical sections. Surely not and it isn't my goal at this point. I think there are a few different issues that the suspend blockers (or wakelocks) framework attempts to address in one bit hammer. To me, they are at least (1) deciding when to suspend, (2) detecting events that should make us avoid suspending (or abort suspend if already started), (3) preventing "untrusted" processes from making the system use too much energy. IMO it's better to treat them as different issues and try to address them separately. Rafael ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend 2010-06-22 15:35 ` Rafael J. Wysocki @ 2010-06-22 19:55 ` Alan Stern 2010-06-22 20:58 ` Rafael J. Wysocki 2010-06-22 19:59 ` [update] " Rafael J. Wysocki [not found] ` <201006222159.28081.rjw__37084.1419128284$1277237903$gmane$org@sisk.pl> 2 siblings, 1 reply; 75+ messages in thread From: Alan Stern @ 2010-06-22 19:55 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Florian Mickler, Linux-pm mailing list, Matthew Garrett, Linux Kernel Mailing List, Dmitry Torokhov, Arve Hjønnevåg, Neil Brown, mark gross On Tue, 22 Jun 2010, Rafael J. Wysocki wrote: > On Tuesday, June 22, 2010, Alan Stern wrote: > > On Tue, 22 Jun 2010, Rafael J. Wysocki wrote: > > > > > Having reconsidered that I think there's more to it. > > > > > > Take the PCI subsystem as an example, specifically pcie_pme_handle_request(). > > > This is the place where wakeup events are started, but it has no idea about > > > how they are going to be handled. Thus in the suspend blocker scheme it would > > > need to activate a blocker, but it wouldn't be able to release it. So, it > > > seems, we would need to associate a suspend blocker with each PCIe device > > > that can generate wakeup events and require all drivers of such devices to > > > deal with a blocker activated by someone else (PCIe PME driver in this > > > particular case). That sounds cumbersome to say the least. > > > > Maybe this means pcie_pme_handle_request() isn't a good place to note > > the arrival of a wakeup event. Doing it in the individual driver might > > work out better. > > But it this case there will be an open window in which suspend may be started > after the wakeup event is signaled, but before the PM core is told about it. That's true but it doesn't matter, assuming the suspend can't progress during this window. > > The problem in your example is that pcie_pme_handle_request() has no > > idea about the nature or extent of the critical section to follow. > > Exactly. > > > Therefore it's not in a good position to mark the beginning of the > > critical section, even though it is in an excellent position to mark > > the receipt of a wakeup event. > > I think we have no choice but to regard the detection of a wakeup event as the > beginning of a "suspend critical section". Receipt of a wakeup event triggers a whole series of function calls, including calls to the resume methods of every driver. The system should be designed so that the next suspend can't begin until those function calls complete. For example, the next suspend certainly can't begin before the resume methods all complete. Given that premise, any one of those functions could serve as the start of a suspend critical section. > > > Moreover, even if we do that, it still doesn't solve the entire problem, > > > because the event may need to be delivered to user space and processed by it. > > > While a driver can check if user space has already read the event, it has > > > no way to detect whether or not it has finished processing it. In fact, > > > processing an event may involve an interaction with a (human) user and there's > > > no general way by which software can figure out when the user considers the > > > event as processed. > > > > Is this the kernel's problem? Once userspace has read the event, we > > can safely say that the kernel's critical section is over. Perhaps a > > userspace critical section has begun, perhaps not; either way, it's no > > longer the kernel's responsibility. > > Well, I agree here, but in the suspend blockers world it is the kernel > responsibility, because the kernel contains the power manager part. In the suspend blockers world (or at least, in Android's version of the suspend blockers world), userspace would activate another suspend blocker before reading the event. The kernel's critical section could then end safely once the event was read. > > > So, even if we can say when the kernel has finished processing the event > > > (although that would be complicated in the PCIe case above), I don't think > > > it's generally possible to ensure that the entire processing of a wakeup event > > > has been completed. This leads to the question whether or not it is worth > > > trying to detect the ending of the processing of a wakeup event. > > > > As Arve pointed out, in some cases it definitely is worthwhile (the > > gpio keypad matrix example). In other cases there may be no reasonable > > way to tell. That doesn't mean we have to give up entirely. > > Well, I'm not sure, because that really depends on the hardware and bus in > question. The necessary condition seems to be that the event be detected > and handled entirely by the same functional unit (eg. a device driver) within > the kernel and such that it is able to detect whether or not user space has > acquired the event information. That doesn't seem to be a common case to me. It's hard to say how common this is without having a list of possible wakeup sources. And of course, that list will differ from one platform to another. > > > Now, going back to the $subject patch, I didn't really think it would be > > > suitable for opportunistic suspend, so let's focus on the "forced" suspend > > > instead. It still has the problem that wakeup events occuring while > > > /sys/power/state is written to (or even slightly before) should cause the > > > system to cancel the suspend, but they generally won't. With the patch > > > applied that can be avoided by (a) reading from /sys/power/wakeup_count, > > > (b) waiting for certain time (such that if a suspend event is not entirely > > > processed within that time, it's worth suspending and waking up the > > > system again) and (c) writing to /sys/power/wakeup_count right before writing > > > to /sys/power/state (where the latter is only done if the former succeeds). > > > > In other words, you could detect if a critical section begins after the > > user process has decided to initiate a suspend. Yes, that's so. > > Generally yes, although I think it will also detect "critical sections" > starting exactly at the moment the suspend is started. Which in fact is the > purpose of the patch. Well, the moment the suspend is started _does_ occur after the user process decides to initiate a suspend. Hence critical sections starting exactly at that moment would indeed be detected. > > So I'm not opposed to the new function. But it doesn't solve the > > entire problem of avoiding suspends during critical sections. > > Surely not and it isn't my goal at this point. > > I think there are a few different issues that the suspend blockers (or > wakelocks) framework attempts to address in one bit hammer. To me, they are > at least (1) deciding when to suspend, (2) detecting events that should make > us avoid suspending (or abort suspend if already started), (3) preventing > "untrusted" processes from making the system use too much energy. > > IMO it's better to treat them as different issues and try to address them > separately. Certainly (3) needs to be addressed separately. It should be handled completely within userspace, if at all possible. (1) and (2) are closely related. In fact, a reasonable criterion for (1) might be: Suspend whenever it is allowed. Then (2) becomes: What sorts of things should disallow suspend, and for how long? Evidently part of the problem here is that for a very long time (predating the existence of Linux), people have been using a bad abstraction. We talk about "wakeup events", but an event occupies only a single moment in time. If the computer happens to be asleep at that moment then it wakes up, fine. But if it was already awake, then once the moment is passed there's no reason not to suspend -- even if only 1 microsecond has elapsed. Likewise, if an event causes the computer to wake up, then once the computer is awake the moment is over and there's nothing to prevent the computer from immediately going back to sleep. Instead of talking about events, we should be talking about procedures or sections: something that happens over a non-zero period of time. But people have never thought in terms of wakeup procedures or suspend critical sections, and so the kernel isn't designed to accomodate them. We may know when they begin, but we often have only a cloudy idea of when they end. Historically, people have mostly had in mind that the answer to (1) would be: Suspend whenever the user tells the computer to suspend. In that kind of setting, (2) doesn't matter: When the user tells the machine to suspend, it should obey. But when we start to consider energy conservation and autonomous (or opportunistic) suspend, things become more complex. This is why, for example, the USB subsystem has a user-selectable autosuspend delay. It's not an ideal solution, but it does prevent us from thrashing between suspending and resuming a device over and over if it gets used repeatedly during a short period of time. We can design mechanisms until we are blue in the face (some of us may be blue already!), but until we remedy this weakness in our thinking we won't know how to apply them. Which means people won't be able to agree on a single correct approach. Alan Stern ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend 2010-06-22 19:55 ` Alan Stern @ 2010-06-22 20:58 ` Rafael J. Wysocki 0 siblings, 0 replies; 75+ messages in thread From: Rafael J. Wysocki @ 2010-06-22 20:58 UTC (permalink / raw) To: Alan Stern Cc: Florian Mickler, Linux-pm mailing list, Matthew Garrett, Linux Kernel Mailing List, Dmitry Torokhov, Arve Hjønnevåg, Neil Brown, mark gross On Tuesday, June 22, 2010, Alan Stern wrote: > On Tue, 22 Jun 2010, Rafael J. Wysocki wrote: > > > On Tuesday, June 22, 2010, Alan Stern wrote: > > > On Tue, 22 Jun 2010, Rafael J. Wysocki wrote: > > > > > > > Having reconsidered that I think there's more to it. > > > > > > > > Take the PCI subsystem as an example, specifically pcie_pme_handle_request(). > > > > This is the place where wakeup events are started, but it has no idea about > > > > how they are going to be handled. Thus in the suspend blocker scheme it would > > > > need to activate a blocker, but it wouldn't be able to release it. So, it > > > > seems, we would need to associate a suspend blocker with each PCIe device > > > > that can generate wakeup events and require all drivers of such devices to > > > > deal with a blocker activated by someone else (PCIe PME driver in this > > > > particular case). That sounds cumbersome to say the least. > > > > > > Maybe this means pcie_pme_handle_request() isn't a good place to note > > > the arrival of a wakeup event. Doing it in the individual driver might > > > work out better. > > > > But it this case there will be an open window in which suspend may be started > > after the wakeup event is signaled, but before the PM core is told about it. > > That's true but it doesn't matter, assuming the suspend can't progress > during this window. > > > > The problem in your example is that pcie_pme_handle_request() has no > > > idea about the nature or extent of the critical section to follow. > > > > Exactly. > > > > > Therefore it's not in a good position to mark the beginning of the > > > critical section, even though it is in an excellent position to mark > > > the receipt of a wakeup event. > > > > I think we have no choice but to regard the detection of a wakeup event as the > > beginning of a "suspend critical section". > > Receipt of a wakeup event triggers a whole series of function calls, > including calls to the resume methods of every driver. The system > should be designed so that the next suspend can't begin until those > function calls complete. For example, the next suspend certainly can't > begin before the resume methods all complete. Given that premise, any > one of those functions could serve as the start of a suspend critical > section. Well, consider pcie_pme_handle_request() again. It certainly can be called during suspend (until the PME interrupt is disabled), but the PM workqueue is frozen at this point, so the device driver's resume routine won't be called. However, the wakeup signal from the device should be regarded as a wakeup event in that case IMO. [We have a check for that in dpm_prepare(), but I think it should be replaced by the "proper" handling of wakeup events, once we have one.] ... > > > > So, even if we can say when the kernel has finished processing the event > > > > (although that would be complicated in the PCIe case above), I don't think > > > > it's generally possible to ensure that the entire processing of a wakeup event > > > > has been completed. This leads to the question whether or not it is worth > > > > trying to detect the ending of the processing of a wakeup event. > > > > > > As Arve pointed out, in some cases it definitely is worthwhile (the > > > gpio keypad matrix example). In other cases there may be no reasonable > > > way to tell. That doesn't mean we have to give up entirely. > > > > Well, I'm not sure, because that really depends on the hardware and bus in > > question. The necessary condition seems to be that the event be detected > > and handled entirely by the same functional unit (eg. a device driver) within > > the kernel and such that it is able to detect whether or not user space has > > acquired the event information. That doesn't seem to be a common case to me. > > It's hard to say how common this is without having a list of possible > wakeup sources. And of course, that list will differ from one platform > to another. Agreed. ... > > I think there are a few different issues that the suspend blockers (or > > wakelocks) framework attempts to address in one bit hammer. To me, they are > > at least (1) deciding when to suspend, (2) detecting events that should make > > us avoid suspending (or abort suspend if already started), (3) preventing > > "untrusted" processes from making the system use too much energy. > > > > IMO it's better to treat them as different issues and try to address them > > separately. > > Certainly (3) needs to be addressed separately. It should be handled > completely within userspace, if at all possible. > > (1) and (2) are closely related. In fact, a reasonable criterion for > (1) might be: Suspend whenever it is allowed. Then (2) becomes: What > sorts of things should disallow suspend, and for how long? The events I mean by (2) are the minimal subset of conditions used in (1), because the user may add more restrictions that should be checked by user space. For example, the user may request to suspend whenever there are no open SSH connections to the machine (why not?), but even if that criterion is satisfied, wake-on-LAN events should prevent suspend from happening. > Evidently part of the problem here is that for a very long time > (predating the existence of Linux), people have been using a bad > abstraction. We talk about "wakeup events", but an event occupies only > a single moment in time. If the computer happens to be asleep at that > moment then it wakes up, fine. But if it was already awake, then once > the moment is passed there's no reason not to suspend -- even if only > 1 microsecond has elapsed. Likewise, if an event causes the computer > to wake up, then once the computer is awake the moment is over and > there's nothing to prevent the computer from immediately going back to > sleep. > > Instead of talking about events, we should be talking about procedures > or sections: something that happens over a non-zero period of time. Agreed. > But people have never thought in terms of wakeup procedures or suspend > critical sections, and so the kernel isn't designed to accomodate them. > We may know when they begin, but we often have only a cloudy idea of > when they end. Yeah. > Historically, people have mostly had in mind that the answer to (1) > would be: Suspend whenever the user tells the computer to suspend. In > that kind of setting, (2) doesn't matter: When the user tells the > machine to suspend, it should obey. Well, not necessarily, because the user can change his mind while the machine is suspending and try to generate a wakeup event to abort the suspend. > But when we start to consider energy conservation and autonomous (or > opportunistic) suspend, things become more complex. This is why, for > example, the USB subsystem has a user-selectable autosuspend delay. > It's not an ideal solution, but it does prevent us from thrashing > between suspending and resuming a device over and over if it gets used > repeatedly during a short period of time. > > We can design mechanisms until we are blue in the face (some of us may > be blue already!), but until we remedy this weakness in our thinking we > won't know how to apply them. Which means people won't be able to > agree on a single correct approach. I agree 100%. Rafael ^ permalink raw reply [flat|nested] 75+ messages in thread
* [update] Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend 2010-06-22 15:35 ` Rafael J. Wysocki 2010-06-22 19:55 ` Alan Stern @ 2010-06-22 19:59 ` Rafael J. Wysocki 2010-06-22 20:34 ` Alan Stern [not found] ` <201006222159.28081.rjw__37084.1419128284$1277237903$gmane$org@sisk.pl> 2 siblings, 1 reply; 75+ messages in thread From: Rafael J. Wysocki @ 2010-06-22 19:59 UTC (permalink / raw) To: Alan Stern Cc: Florian Mickler, Linux-pm mailing list, Matthew Garrett, Linux Kernel Mailing List, Dmitry Torokhov, Arve Hjønnevåg, Neil Brown, mark gross On Tuesday, June 22, 2010, Rafael J. Wysocki wrote: > On Tuesday, June 22, 2010, Alan Stern wrote: > > On Tue, 22 Jun 2010, Rafael J. Wysocki wrote: ... > > > So, even if we can say when the kernel has finished processing the event > > > (although that would be complicated in the PCIe case above), I don't think > > > it's generally possible to ensure that the entire processing of a wakeup event > > > has been completed. This leads to the question whether or not it is worth > > > trying to detect the ending of the processing of a wakeup event. > > > > As Arve pointed out, in some cases it definitely is worthwhile (the > > gpio keypad matrix example). In other cases there may be no reasonable > > way to tell. That doesn't mean we have to give up entirely. > > Well, I'm not sure, because that really depends on the hardware and bus in > question. The necessary condition seems to be that the event be detected > and handled entirely by the same functional unit (eg. a device driver) within > the kernel and such that it is able to detect whether or not user space has > acquired the event information. That doesn't seem to be a common case to me. Anyway, below's an update that addresses this particular case. It adds two more functions, pm_wakeup_begin() and pm_wakeup_end() that play similar roles to suspend_block() and suspend_unblock(), but they don't operate on suspend blocker objects. Instead, the first of them increases a counter of events in progress and the other one decreases this counter. Together they have the same effect as pm_wakeup_event(), but the counter of wakeup events in progress they operate on is also checked by pm_check_wakeup_events(). Thus there are two ways kernel subsystems can signal wakeup events. First, if the event is not explicitly handed over to user space and "instantaneous", they can simply call pm_wakeup_event() and be done with it. Second, if the event is going to be delivered to user space, the subsystem that processes the event can call pm_wakeup_begin() right when the event is detected and pm_wakeup_end() when it's been handed over to user space. Please tell me what you think. Rafael --- drivers/base/power/Makefile | 2 drivers/base/power/main.c | 1 drivers/base/power/power.h | 3 + drivers/base/power/sysfs.c | 9 +++ drivers/base/power/wakeup.c | 113 ++++++++++++++++++++++++++++++++++++++++ drivers/pci/pci-acpi.c | 2 drivers/pci/pcie/pme/pcie_pme.c | 2 include/linux/pm.h | 10 +++ kernel/power/hibernate.c | 14 +++- kernel/power/main.c | 24 ++++++++ kernel/power/power.h | 7 ++ kernel/power/suspend.c | 2 12 files changed, 182 insertions(+), 7 deletions(-) Index: linux-2.6/kernel/power/main.c =================================================================== --- linux-2.6.orig/kernel/power/main.c +++ linux-2.6/kernel/power/main.c @@ -204,6 +204,28 @@ static ssize_t state_store(struct kobjec power_attr(state); +static ssize_t wakeup_count_show(struct kobject *kobj, + struct kobj_attribute *attr, + char *buf) +{ + return sprintf(buf, "%lu\n", pm_get_wakeup_count()); +} + +static ssize_t wakeup_count_store(struct kobject *kobj, + struct kobj_attribute *attr, + const char *buf, size_t n) +{ + unsigned long val; + + if (sscanf(buf, "%lu", &val) == 1) { + if (pm_save_wakeup_count(val)) + return n; + } + return -EINVAL; +} + +power_attr(wakeup_count); + #ifdef CONFIG_PM_TRACE int pm_trace_enabled; @@ -236,6 +258,7 @@ static struct attribute * g[] = { #endif #ifdef CONFIG_PM_SLEEP &pm_async_attr.attr, + &wakeup_count_attr.attr, #ifdef CONFIG_PM_DEBUG &pm_test_attr.attr, #endif @@ -266,6 +289,7 @@ static int __init pm_init(void) int error = pm_start_workqueue(); if (error) return error; + pm_wakeup_events_init(); power_kobj = kobject_create_and_add("power", NULL); if (!power_kobj) return -ENOMEM; Index: linux-2.6/drivers/base/power/wakeup.c =================================================================== --- /dev/null +++ linux-2.6/drivers/base/power/wakeup.c @@ -0,0 +1,113 @@ + +#include <linux/device.h> +#include <linux/pm.h> + +static unsigned long event_count; +static unsigned long saved_event_count; +static unsigned long events_in_progress; +static bool events_check_enabled; +static spinlock_t events_lock; + +void pm_wakeup_events_init(void) +{ + spin_lock_init(&events_lock); +} + +void pm_wakeup_event(struct device *dev) +{ + unsigned long flags; + + spin_lock_irqsave(&events_lock, flags); + event_count++; + if (dev) + dev->power.wakeup_count++; + spin_unlock_irqrestore(&events_lock, flags); +} + +void pm_wakeup_begin(struct device *dev) +{ + unsigned long flags; + + spin_lock_irqsave(&events_lock, flags); + events_in_progress++; + if (dev) + dev->power.wakeup_count++; + spin_unlock_irqrestore(&events_lock, flags); +} + +void pm_wakeup_end(void) +{ + unsigned long flags; + + spin_lock_irqsave(&events_lock, flags); + if (events_in_progress) { + events_in_progress--; + event_count++; + } + spin_unlock_irqrestore(&events_lock, flags); +} + +static bool check_wakeup_events(void) +{ + return !events_check_enabled + || ((event_count == saved_event_count) && !events_in_progress); +} + +bool pm_check_wakeup_events(void) +{ + unsigned long flags; + bool ret; + + spin_lock_irqsave(&events_lock, flags); + ret = check_wakeup_events(); + events_check_enabled = ret; + spin_unlock_irqrestore(&events_lock, flags); + return ret; +} + +bool pm_check_wakeup_events_final(void) +{ + bool ret; + + ret = check_wakeup_events(); + events_check_enabled = false; + return ret; +} + +unsigned long pm_get_wakeup_count(void) +{ + unsigned long flags; + unsigned long count; + + spin_lock_irqsave(&events_lock, flags); + events_check_enabled = false; + count = event_count; + spin_unlock_irqrestore(&events_lock, flags); + return count; +} + +bool pm_save_wakeup_count(unsigned long count) +{ + unsigned long flags; + bool ret = false; + + spin_lock_irqsave(&events_lock, flags); + if (count == event_count) { + saved_event_count = count; + events_check_enabled = true; + ret = true; + } + spin_unlock_irqrestore(&events_lock, flags); + return ret; +} + +unsigned long pm_dev_wakeup_count(struct device *dev) +{ + unsigned long flags; + unsigned long count; + + spin_lock_irqsave(&events_lock, flags); + count = dev->power.wakeup_count; + spin_unlock_irqrestore(&events_lock, flags); + return count; +} Index: linux-2.6/include/linux/pm.h =================================================================== --- linux-2.6.orig/include/linux/pm.h +++ linux-2.6/include/linux/pm.h @@ -457,6 +457,7 @@ struct dev_pm_info { #ifdef CONFIG_PM_SLEEP struct list_head entry; struct completion completion; + unsigned long wakeup_count; #endif #ifdef CONFIG_PM_RUNTIME struct timer_list suspend_timer; @@ -552,6 +553,11 @@ extern void __suspend_report_result(cons } while (0) extern void device_pm_wait_for_dev(struct device *sub, struct device *dev); + +/* drivers/base/power/wakeup.c */ +extern void pm_wakeup_event(struct device *dev); +extern void pm_wakeup_begin(struct device *dev); +extern void pm_wakeup_end(void); #else /* !CONFIG_PM_SLEEP */ #define device_pm_lock() do {} while (0) @@ -565,6 +571,10 @@ static inline int dpm_suspend_start(pm_m #define suspend_report_result(fn, ret) do {} while (0) static inline void device_pm_wait_for_dev(struct device *a, struct device *b) {} + +static inline void pm_wakeup_event(struct device *dev) {} +static inline void pm_wakeup_begin(struct device *dev) {} +static inline void pm_wakeup_end(void) {} #endif /* !CONFIG_PM_SLEEP */ /* How to reorder dpm_list after device_move() */ Index: linux-2.6/drivers/base/power/Makefile =================================================================== --- linux-2.6.orig/drivers/base/power/Makefile +++ linux-2.6/drivers/base/power/Makefile @@ -1,5 +1,5 @@ obj-$(CONFIG_PM) += sysfs.o -obj-$(CONFIG_PM_SLEEP) += main.o +obj-$(CONFIG_PM_SLEEP) += main.o wakeup.o obj-$(CONFIG_PM_RUNTIME) += runtime.o obj-$(CONFIG_PM_OPS) += generic_ops.o obj-$(CONFIG_PM_TRACE_RTC) += trace.o Index: linux-2.6/drivers/base/power/main.c =================================================================== --- linux-2.6.orig/drivers/base/power/main.c +++ linux-2.6/drivers/base/power/main.c @@ -59,6 +59,7 @@ void device_pm_init(struct device *dev) { dev->power.status = DPM_ON; init_completion(&dev->power.completion); + dev->power.wakeup_count = 0; pm_runtime_init(dev); } Index: linux-2.6/kernel/power/power.h =================================================================== --- linux-2.6.orig/kernel/power/power.h +++ linux-2.6/kernel/power/power.h @@ -184,6 +184,13 @@ static inline void suspend_test_finish(c #ifdef CONFIG_PM_SLEEP /* kernel/power/main.c */ extern int pm_notifier_call_chain(unsigned long val); + +/* drivers/base/power/wakeup.c */ +extern void pm_wakeup_events_init(void); +extern bool pm_check_wakeup_events(void); +extern bool pm_check_wakeup_events_final(void); +extern unsigned long pm_get_wakeup_count(void); +extern bool pm_save_wakeup_count(unsigned long count); #endif #ifdef CONFIG_HIGHMEM Index: linux-2.6/kernel/power/suspend.c =================================================================== --- linux-2.6.orig/kernel/power/suspend.c +++ linux-2.6/kernel/power/suspend.c @@ -172,7 +172,7 @@ static int suspend_enter(suspend_state_t error = sysdev_suspend(PMSG_SUSPEND); if (!error) { - if (!suspend_test(TEST_CORE)) + if (!suspend_test(TEST_CORE) && pm_check_wakeup_events_final()) error = suspend_ops->enter(state); sysdev_resume(); } Index: linux-2.6/kernel/power/hibernate.c =================================================================== --- linux-2.6.orig/kernel/power/hibernate.c +++ linux-2.6/kernel/power/hibernate.c @@ -277,7 +277,7 @@ static int create_image(int platform_mod goto Enable_irqs; } - if (hibernation_test(TEST_CORE)) + if (hibernation_test(TEST_CORE) || !pm_check_wakeup_events()) goto Power_up; in_suspend = 1; @@ -511,14 +511,18 @@ int hibernation_platform_enter(void) local_irq_disable(); sysdev_suspend(PMSG_HIBERNATE); + if (!pm_check_wakeup_events_final()) + goto Power_up; + hibernation_ops->enter(); /* We should never get here */ while (1); - /* - * We don't need to reenable the nonboot CPUs or resume consoles, since - * the system is going to be halted anyway. - */ + Power_up: + sysdev_resume(); + local_irq_enable(); + enable_nonboot_cpus(); + Platform_finish: hibernation_ops->finish(); Index: linux-2.6/drivers/pci/pci-acpi.c =================================================================== --- linux-2.6.orig/drivers/pci/pci-acpi.c +++ linux-2.6/drivers/pci/pci-acpi.c @@ -48,6 +48,8 @@ static void pci_acpi_wake_dev(acpi_handl if (event == ACPI_NOTIFY_DEVICE_WAKE && pci_dev) { pci_check_pme_status(pci_dev); pm_runtime_resume(&pci_dev->dev); + if (device_may_wakeup(&pci_dev->dev)) + pm_wakeup_event(&pci_dev->dev); if (pci_dev->subordinate) pci_pme_wakeup_bus(pci_dev->subordinate); } Index: linux-2.6/drivers/pci/pcie/pme/pcie_pme.c =================================================================== --- linux-2.6.orig/drivers/pci/pcie/pme/pcie_pme.c +++ linux-2.6/drivers/pci/pcie/pme/pcie_pme.c @@ -154,6 +154,8 @@ static bool pcie_pme_walk_bus(struct pci /* Skip PCIe devices in case we started from a root port. */ if (!pci_is_pcie(dev) && pci_check_pme_status(dev)) { pm_request_resume(&dev->dev); + if (device_may_wakeup(&dev->dev)) + pm_wakeup_event(&dev->dev); ret = true; } Index: linux-2.6/drivers/base/power/power.h =================================================================== --- linux-2.6.orig/drivers/base/power/power.h +++ linux-2.6/drivers/base/power/power.h @@ -30,6 +30,9 @@ extern void device_pm_move_before(struct extern void device_pm_move_after(struct device *, struct device *); extern void device_pm_move_last(struct device *); +/* drivers/base/power/wakeup.c */ +extern unsigned long pm_dev_wakeup_count(struct device *dev); + #else /* !CONFIG_PM_SLEEP */ static inline void device_pm_init(struct device *dev) Index: linux-2.6/drivers/base/power/sysfs.c =================================================================== --- linux-2.6.orig/drivers/base/power/sysfs.c +++ linux-2.6/drivers/base/power/sysfs.c @@ -144,6 +144,14 @@ wake_store(struct device * dev, struct d static DEVICE_ATTR(wakeup, 0644, wake_show, wake_store); +static ssize_t wakeup_count_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + return sprintf(buf, "%lu\n", pm_dev_wakeup_count(dev)); +} + +static DEVICE_ATTR(wakeup_count, 0444, wakeup_count_show, NULL); + #ifdef CONFIG_PM_ADVANCED_DEBUG #ifdef CONFIG_PM_RUNTIME @@ -230,6 +238,7 @@ static struct attribute * power_attrs[] &dev_attr_control.attr, #endif &dev_attr_wakeup.attr, + &dev_attr_wakeup_count.attr, #ifdef CONFIG_PM_ADVANCED_DEBUG &dev_attr_async.attr, #ifdef CONFIG_PM_RUNTIME ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [update] Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend 2010-06-22 19:59 ` [update] " Rafael J. Wysocki @ 2010-06-22 20:34 ` Alan Stern 2010-06-22 21:41 ` Rafael J. Wysocki 0 siblings, 1 reply; 75+ messages in thread From: Alan Stern @ 2010-06-22 20:34 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Florian Mickler, Linux-pm mailing list, Matthew Garrett, Linux Kernel Mailing List, Dmitry Torokhov, Arve Hjønnevåg, Neil Brown, mark gross On Tue, 22 Jun 2010, Rafael J. Wysocki wrote: > Anyway, below's an update that addresses this particular case. > > It adds two more functions, pm_wakeup_begin() and pm_wakeup_end() > that play similar roles to suspend_block() and suspend_unblock(), but they > don't operate on suspend blocker objects. Instead, the first of them increases > a counter of events in progress and the other one decreases this counter. > Together they have the same effect as pm_wakeup_event(), but the counter > of wakeup events in progress they operate on is also checked by > pm_check_wakeup_events(). > > Thus there are two ways kernel subsystems can signal wakeup events. First, > if the event is not explicitly handed over to user space and "instantaneous", > they can simply call pm_wakeup_event() and be done with it. Second, if the > event is going to be delivered to user space, the subsystem that processes > the event can call pm_wakeup_begin() right when the event is detected and > pm_wakeup_end() when it's been handed over to user space. Or if the event is going to be handled entirely in the kernel but over a prolonged period of time. > Please tell me what you think. I like it a lot. It addresses the main weakness in the earlier version. With this, you could essentially duplicate the in-kernel part of the wakelocks/suspend blockers stuff. All except the timed wakelocks -- you might want to consider adding a pm_wakeup_begin_timeout() convenience routine. Here's another possible enhancement (if you can figure out a way to do it without too much effort): After a suspend begins, keep track of the first wakeup event you get. Then when the suspend is aborted, print a log message saying why and indicating which device was responsible for the wakeup. One little thing: You have the PCI subsystem call pm_wakeup_event(). If the driver then wants to call pm_wakeup_begin(), the event will get counted twice. I guess this doesn't matter much, but it does seem peculiar. Alan Stern ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [update] Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend 2010-06-22 20:34 ` Alan Stern @ 2010-06-22 21:41 ` Rafael J. Wysocki 2010-06-23 2:12 ` Alan Stern 0 siblings, 1 reply; 75+ messages in thread From: Rafael J. Wysocki @ 2010-06-22 21:41 UTC (permalink / raw) To: Alan Stern Cc: Florian Mickler, Linux-pm mailing list, Matthew Garrett, Linux Kernel Mailing List, Dmitry Torokhov, Arve Hjønnevåg, Neil Brown, mark gross On Tuesday, June 22, 2010, Alan Stern wrote: > On Tue, 22 Jun 2010, Rafael J. Wysocki wrote: > > > Anyway, below's an update that addresses this particular case. > > > > It adds two more functions, pm_wakeup_begin() and pm_wakeup_end() > > that play similar roles to suspend_block() and suspend_unblock(), but they > > don't operate on suspend blocker objects. Instead, the first of them increases > > a counter of events in progress and the other one decreases this counter. > > Together they have the same effect as pm_wakeup_event(), but the counter > > of wakeup events in progress they operate on is also checked by > > pm_check_wakeup_events(). > > > > Thus there are two ways kernel subsystems can signal wakeup events. First, > > if the event is not explicitly handed over to user space and "instantaneous", > > they can simply call pm_wakeup_event() and be done with it. Second, if the > > event is going to be delivered to user space, the subsystem that processes > > the event can call pm_wakeup_begin() right when the event is detected and > > pm_wakeup_end() when it's been handed over to user space. > > Or if the event is going to be handled entirely in the kernel but over > a prolonged period of time. > > > Please tell me what you think. > > I like it a lot. It addresses the main weakness in the earlier > version. With this, you could essentially duplicate the in-kernel part > of the wakelocks/suspend blockers stuff. All except the timed > wakelocks -- you might want to consider adding a > pm_wakeup_begin_timeout() convenience routine. That may be added in future quite easily if it really turns out to be necessary. IIRC Arve said Android only used timeouts in user space wakelocks, not in the kernel ones. > Here's another possible enhancement (if you can figure out a way to do > it without too much effort): After a suspend begins, keep track of the > first wakeup event you get. Then when the suspend is aborted, print a > log message saying why and indicating which device was responsible for > the wakeup. Good idea, but I'd prefer to add it in a separate patch not to complicate things too much to start with. > One little thing: You have the PCI subsystem call pm_wakeup_event(). > If the driver then wants to call pm_wakeup_begin(), the event will get > counted twice. I guess this doesn't matter much, but it does seem > peculiar. Knowing that the PCI core has increased the wakeup count of its device, a PCI driver may simply use pm_wakeup_begin(NULL) and that will only cause the main counter to be increased in the end. Which kind of makes sense, because in that case there really is a sequence of events. First, the PCI core detects a wakeup signal and requests wakeup so that the driver has a chance to access the device and get the event information from it (although at this point it is not known whether or not the driver will need to do that). Second, the driver requests that the system stay in the working state, because it needs time to process the event data and (presumably) hand it over to user space. The device has only signaled wakeup once, though, and that should be recorded. BTW, I thought I would make pm_get_wakeup_count() and pm_save_wakeup_count() fail if they are called when events_in_progress is nonzero. For pm_save_wakeup_count() that's pretty obvious (I think) and it also kind of makes sense for pm_get_wakeup_count(), because that will tell the reader of /sys/power/wakeup_count that the value is going to change immediately so it should really try again. Rafael ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [update] Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend 2010-06-22 21:41 ` Rafael J. Wysocki @ 2010-06-23 2:12 ` Alan Stern 2010-06-23 10:09 ` Rafael J. Wysocki 0 siblings, 1 reply; 75+ messages in thread From: Alan Stern @ 2010-06-23 2:12 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Florian Mickler, Linux-pm mailing list, Matthew Garrett, Linux Kernel Mailing List, Dmitry Torokhov, Arve Hjønnevåg, Neil Brown, mark gross On Tue, 22 Jun 2010, Rafael J. Wysocki wrote: > > > Please tell me what you think. > > > > I like it a lot. It addresses the main weakness in the earlier > > version. With this, you could essentially duplicate the in-kernel part > > of the wakelocks/suspend blockers stuff. All except the timed > > wakelocks -- you might want to consider adding a > > pm_wakeup_begin_timeout() convenience routine. > > That may be added in future quite easily if it really turns out to be necessary. > IIRC Arve said Android only used timeouts in user space wakelocks, not in the > kernel ones. Didn't we agree that timeouts would be needed for Wake-on-LAN? > > Here's another possible enhancement (if you can figure out a way to do > > it without too much effort): After a suspend begins, keep track of the > > first wakeup event you get. Then when the suspend is aborted, print a > > log message saying why and indicating which device was responsible for > > the wakeup. > > Good idea, but I'd prefer to add it in a separate patch not to complicate > things too much to start with. Okay. Another thing to be considered later is whether there should be a way to write to /sys/power/state that would block until the active wakeup count drops to 0. On the other hand polling maybe once per second wouldn't be so bad. It would happen only when the kernel had some events outstanding and userspace didn't. One thing that stands out is the new spinlock. It could potentially be a big source of contention. Any wakeup-enabled device is liable to need it during every interrupt. Do you think this could cause a noticeable slowdown? > > One little thing: You have the PCI subsystem call pm_wakeup_event(). > > If the driver then wants to call pm_wakeup_begin(), the event will get > > counted twice. I guess this doesn't matter much, but it does seem > > peculiar. > > Knowing that the PCI core has increased the wakeup count of its device, a > PCI driver may simply use pm_wakeup_begin(NULL) and that will only cause > the main counter to be increased in the end. Which kind of makes sense, > because in that case there really is a sequence of events. First, the PCI core > detects a wakeup signal and requests wakeup so that the driver has a chance > to access the device and get the event information from it (although at this > point it is not known whether or not the driver will need to do that). Second, > the driver requests that the system stay in the working state, because it needs > time to process the event data and (presumably) hand it over to user space. > The device has only signaled wakeup once, though, and that should be recorded. Okay, that works. Although if anybody wanted to keep track of timing statistics, the results wouldn't be as effective since the start and end times would not be associated with the device. > BTW, I thought I would make pm_get_wakeup_count() and pm_save_wakeup_count() > fail if they are called when events_in_progress is nonzero. For > pm_save_wakeup_count() that's pretty obvious (I think) and it also kind of > makes sense for pm_get_wakeup_count(), because that will tell the reader of > /sys/power/wakeup_count that the value is going to change immediately so it > should really try again. Sensible. Alan Stern ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [update] Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend 2010-06-23 2:12 ` Alan Stern @ 2010-06-23 10:09 ` Rafael J. Wysocki 2010-06-23 15:21 ` Alan Stern 0 siblings, 1 reply; 75+ messages in thread From: Rafael J. Wysocki @ 2010-06-23 10:09 UTC (permalink / raw) To: Alan Stern Cc: Florian Mickler, Linux-pm mailing list, Matthew Garrett, Linux Kernel Mailing List, Dmitry Torokhov, Arve Hjønnevåg, Neil Brown, mark gross On Wednesday, June 23, 2010, Alan Stern wrote: > On Tue, 22 Jun 2010, Rafael J. Wysocki wrote: > > > > > Please tell me what you think. > > > > > > I like it a lot. It addresses the main weakness in the earlier > > > version. With this, you could essentially duplicate the in-kernel part > > > of the wakelocks/suspend blockers stuff. All except the timed > > > wakelocks -- you might want to consider adding a > > > pm_wakeup_begin_timeout() convenience routine. > > > > That may be added in future quite easily if it really turns out to be necessary. > > IIRC Arve said Android only used timeouts in user space wakelocks, not in the > > kernel ones. > > Didn't we agree that timeouts would be needed for Wake-on-LAN? Yes, we did, but in the WoL case the timeout will have to be used by the user space rather than the kernel IMO. It would make sense to add a timeout argument to pm_wakeup_event() that would make the system stay in the working state long enough for the driver wakeup code to start in the PCIe case. I think pm_wakeup_event() mght just increment events_in_progress and the timer might simply decrement it. So, maybe it's just better to have pm_wakeup_event(dev, timeout) that will increment events_in_progress and set up a timer and pm_wakeup_commit(dev) that will delete the timer, decrement events_in_progress and increment event_count (unless the timer has already expired before). That would cost us a (one more) timer in struct dev_pm_info, but it would allow us to cover all of the cases identified so far. So, if a wakeup event is handled within one functional unit that both detects it and delivers it to user space, it would call pm_wakeup_event(dev, 0) (ie. infinite timeout) at the beginning and then pm_wakeup_commit(dev) when it's done with the event. If a wakeup event it's just detected by one piece of code and is going to be handled by another, the first one could call pm_wakeup_event(dev, tm) and allow the other one to call pm_wakeup_commit(dev) when it's done. However, calling pm_wakeup_commit(dev) is not mandatory, so the second piece of code (eg. a PCI driver) doesn't really need to do anything in the simplest case. > > > Here's another possible enhancement (if you can figure out a way to do > > > it without too much effort): After a suspend begins, keep track of the > > > first wakeup event you get. Then when the suspend is aborted, print a > > > log message saying why and indicating which device was responsible for > > > the wakeup. > > > > Good idea, but I'd prefer to add it in a separate patch not to complicate > > things too much to start with. > > Okay. Another thing to be considered later is whether there should be > a way to write to /sys/power/state that would block until the active > wakeup count drops to 0. On the other hand polling maybe once per > second wouldn't be so bad. It would happen only when the kernel had > some events outstanding and userspace didn't. Blocking on a write to /sys/power/state wouldn't help, because if the active wakeup count is nonzero at this point, the suspend should be aborted anyway, so there's no need to wait. The same applies to writing to /sys/power/wakeup_count. However, blocking a read from /sys/power/wakeup_count instead of failing it when the active wakeup count is nonzero would make sense. > One thing that stands out is the new spinlock. It could potentially be > a big source of contention. Any wakeup-enabled device is liable to > need it during every interrupt. Do you think this could cause a > noticeable slowdown? That really depends on the number of wakeup devices. However, ISTR the original wakelocks code had exactly the same issue (it used a spinlock to protect the lists of wakelocks). Rafael ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [update] Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend 2010-06-23 10:09 ` Rafael J. Wysocki @ 2010-06-23 15:21 ` Alan Stern 2010-06-23 22:17 ` Rafael J. Wysocki 0 siblings, 1 reply; 75+ messages in thread From: Alan Stern @ 2010-06-23 15:21 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Florian Mickler, Linux-pm mailing list, Matthew Garrett, Linux Kernel Mailing List, Dmitry Torokhov, Arve Hjønnevåg, Neil Brown, mark gross On Wed, 23 Jun 2010, Rafael J. Wysocki wrote: > > Didn't we agree that timeouts would be needed for Wake-on-LAN? > > Yes, we did, but in the WoL case the timeout will have to be used by the user > space rather than the kernel IMO. The kernel will still have to specify some small initial timeout. Just long enough for userspace to realize what has happened and start its own critical section. > It would make sense to add a timeout argument to pm_wakeup_event() that would > make the system stay in the working state long enough for the driver wakeup > code to start in the PCIe case. I think pm_wakeup_event() mght just increment > events_in_progress and the timer might simply decrement it. Hmm. I was thinking about a similar problem with the USB hub driver. Maybe a better answer for this particular issue is to change the workqueue code. Don't allow a work thread to enter the freezer until its queue is empty. Then you wouldn't need a timeout. > So, maybe it's just better to have pm_wakeup_event(dev, timeout) that will > increment events_in_progress and set up a timer and pm_wakeup_commit(dev) that > will delete the timer, decrement events_in_progress and increment event_count > (unless the timer has already expired before). > > That would cost us a (one more) timer in struct dev_pm_info, but it would > allow us to cover all of the cases identified so far. So, if a wakeup event is > handled within one functional unit that both detects it and delivers it to > user space, it would call pm_wakeup_event(dev, 0) (ie. infinite timeout) at the > beginning and then pm_wakeup_commit(dev) when it's done with the event. > If a wakeup event it's just detected by one piece of code and is going to > be handled by another, the first one could call pm_wakeup_event(dev, tm) and > allow the other one to call pm_wakeup_commit(dev) when it's done. However, > calling pm_wakeup_commit(dev) is not mandatory, so the second piece of code > (eg. a PCI driver) doesn't really need to do anything in the simplest case. You have to handle the case where pm_wakeup_commit() gets called after the timer expires (it should do nothing). And what happens if the device gets a second wakeup event before the timer for the first one expires? dev_pm_info would need to store a count of pending events. In fact, this gets even worse. What if the second event causes you to move the timeout forward, but then you get a commit for the second event before the original timer would have expired? It's not clear that timeouts and early commit work well together. You could consider changing some of the new function names. Instead of "wakeup" (which implies that the system was asleep previously) use "awake" (which implies that you want to prevent the system from going to sleep, as in "stay awake"). > > One thing that stands out is the new spinlock. It could potentially be > > a big source of contention. Any wakeup-enabled device is liable to > > need it during every interrupt. Do you think this could cause a > > noticeable slowdown? > > That really depends on the number of wakeup devices. However, ISTR the > original wakelocks code had exactly the same issue (it used a spinlock to > protect the lists of wakelocks). Yeah, that's right. I have already forgotten the details of how that original design worked. Alan Stern ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [update] Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend 2010-06-23 15:21 ` Alan Stern @ 2010-06-23 22:17 ` Rafael J. Wysocki 2010-06-24 13:13 ` [update 2] " Rafael J. Wysocki 0 siblings, 1 reply; 75+ messages in thread From: Rafael J. Wysocki @ 2010-06-23 22:17 UTC (permalink / raw) To: Alan Stern Cc: Florian Mickler, Linux-pm mailing list, Matthew Garrett, Linux Kernel Mailing List, Dmitry Torokhov, Arve Hjønnevåg, Neil Brown, mark gross On Wednesday, June 23, 2010, Alan Stern wrote: > On Wed, 23 Jun 2010, Rafael J. Wysocki wrote: > > > > Didn't we agree that timeouts would be needed for Wake-on-LAN? > > > > Yes, we did, but in the WoL case the timeout will have to be used by the user > > space rather than the kernel IMO. > > The kernel will still have to specify some small initial timeout. Just > long enough for userspace to realize what has happened and start its > own critical section. > > > It would make sense to add a timeout argument to pm_wakeup_event() that would > > make the system stay in the working state long enough for the driver wakeup > > code to start in the PCIe case. I think pm_wakeup_event() mght just increment > > events_in_progress and the timer might simply decrement it. > > Hmm. I was thinking about a similar problem with the USB hub driver. > > Maybe a better answer for this particular issue is to change the > workqueue code. Don't allow a work thread to enter the freezer until > its queue is empty. Then you wouldn't need a timeout. > > > So, maybe it's just better to have pm_wakeup_event(dev, timeout) that will > > increment events_in_progress and set up a timer and pm_wakeup_commit(dev) that > > will delete the timer, decrement events_in_progress and increment event_count > > (unless the timer has already expired before). > > > > That would cost us a (one more) timer in struct dev_pm_info, but it would > > allow us to cover all of the cases identified so far. So, if a wakeup event is > > handled within one functional unit that both detects it and delivers it to > > user space, it would call pm_wakeup_event(dev, 0) (ie. infinite timeout) at the > > beginning and then pm_wakeup_commit(dev) when it's done with the event. > > If a wakeup event it's just detected by one piece of code and is going to > > be handled by another, the first one could call pm_wakeup_event(dev, tm) and > > allow the other one to call pm_wakeup_commit(dev) when it's done. However, > > calling pm_wakeup_commit(dev) is not mandatory, so the second piece of code > > (eg. a PCI driver) doesn't really need to do anything in the simplest case. > > You have to handle the case where pm_wakeup_commit() gets called after > the timer expires (it should do nothing). Yup. > And what happens if the device gets a second wakeup event before the timer > for the first one expires? Good question. I don't have an answer to it at the moment, but it seems to arise from using a single timer for all events. It looks like it's simpler to make pm_wakeup_event() allocate a timer for each event and make the timer function remove it. That would cause suspend to be blocked until the timer expires without a way to cancel it earlier, though. > dev_pm_info would need to store a count of pending events. I'm not sure if that helps. It would if both events were going to be handled in the same way, but I'm not sure if we can safely assume this. > In fact, this gets even worse. What if the second event causes you to > move the timeout forward, but then you get a commit for the second > event before the original timer would have expired? It's not clear > that timeouts and early commit work well together. I think they generally do, but there are problems here, as you noted. > You could consider changing some of the new function names. Instead of > "wakeup" (which implies that the system was asleep previously) use > "awake" (which implies that you want to prevent the system from going > to sleep, as in "stay awake"). A wakeup event may be defined as an event that would cause the system to wakeup if it were is a sleep state, so I think the name of pm_wakeup_event() is fine. The other names might be better. Rafael ^ permalink raw reply [flat|nested] 75+ messages in thread
* [update 2] Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend 2010-06-23 22:17 ` Rafael J. Wysocki @ 2010-06-24 13:13 ` Rafael J. Wysocki 2010-06-24 15:06 ` Rafael J. Wysocki 2010-06-24 15:44 ` [update 2] " Alan Stern 0 siblings, 2 replies; 75+ messages in thread From: Rafael J. Wysocki @ 2010-06-24 13:13 UTC (permalink / raw) To: Alan Stern Cc: Florian Mickler, Linux-pm mailing list, Matthew Garrett, Linux Kernel Mailing List, Dmitry Torokhov, Arve Hjønnevåg, Neil Brown, mark gross On Thursday, June 24, 2010, Rafael J. Wysocki wrote: > On Wednesday, June 23, 2010, Alan Stern wrote: > > On Wed, 23 Jun 2010, Rafael J. Wysocki wrote: > > > > > > Didn't we agree that timeouts would be needed for Wake-on-LAN? > > > > > > Yes, we did, but in the WoL case the timeout will have to be used by the user > > > space rather than the kernel IMO. > > > > The kernel will still have to specify some small initial timeout. Just > > long enough for userspace to realize what has happened and start its > > own critical section. > > > > > It would make sense to add a timeout argument to pm_wakeup_event() that would > > > make the system stay in the working state long enough for the driver wakeup > > > code to start in the PCIe case. I think pm_wakeup_event() mght just increment > > > events_in_progress and the timer might simply decrement it. > > > > Hmm. I was thinking about a similar problem with the USB hub driver. > > > > Maybe a better answer for this particular issue is to change the > > workqueue code. Don't allow a work thread to enter the freezer until > > its queue is empty. Then you wouldn't need a timeout. > > > > > So, maybe it's just better to have pm_wakeup_event(dev, timeout) that will > > > increment events_in_progress and set up a timer and pm_wakeup_commit(dev) that > > > will delete the timer, decrement events_in_progress and increment event_count > > > (unless the timer has already expired before). > > > > > > That would cost us a (one more) timer in struct dev_pm_info, but it would > > > allow us to cover all of the cases identified so far. So, if a wakeup event is > > > handled within one functional unit that both detects it and delivers it to > > > user space, it would call pm_wakeup_event(dev, 0) (ie. infinite timeout) at the > > > beginning and then pm_wakeup_commit(dev) when it's done with the event. > > > If a wakeup event it's just detected by one piece of code and is going to > > > be handled by another, the first one could call pm_wakeup_event(dev, tm) and > > > allow the other one to call pm_wakeup_commit(dev) when it's done. However, > > > calling pm_wakeup_commit(dev) is not mandatory, so the second piece of code > > > (eg. a PCI driver) doesn't really need to do anything in the simplest case. > > > > You have to handle the case where pm_wakeup_commit() gets called after > > the timer expires (it should do nothing). > > Yup. > > > And what happens if the device gets a second wakeup event before the timer > > for the first one expires? > > Good question. I don't have an answer to it at the moment, but it seems to > arise from using a single timer for all events. > > It looks like it's simpler to make pm_wakeup_event() allocate a timer for each > event and make the timer function remove it. That would cause suspend to > be blocked until the timer expires without a way to cancel it earlier, though. So, I decided to try this after all. Below is a new version of the patch. It introduces pm_stay_awake(dev) and pm_relax() that play the roles of the "old" pm_wakeup_begin() and pm_wakeup_end(). pm_wakeup_event() now takes an extra timeout argument and uses it for deferred execution of pm_relax(). So, one can either use the pm_stay_awake(dev) / pm_relax() pair, or use pm_wakeup_event(dev, timeout) if the ending is under someone else's control. In addition to that, pm_get_wakeup_count() blocks until events_in_progress is zero. Please tell me what you think. Rafael --- drivers/base/power/Makefile | 2 drivers/base/power/main.c | 1 drivers/base/power/power.h | 3 drivers/base/power/sysfs.c | 9 ++ drivers/base/power/wakeup.c | 150 ++++++++++++++++++++++++++++++++++++++++ drivers/pci/pci-acpi.c | 2 drivers/pci/pci.c | 5 + drivers/pci/pci.h | 2 drivers/pci/pcie/pme/pcie_pme.c | 7 + include/linux/pm.h | 10 ++ kernel/power/hibernate.c | 18 +++- kernel/power/main.c | 24 ++++++ kernel/power/power.h | 7 + kernel/power/suspend.c | 2 14 files changed, 232 insertions(+), 10 deletions(-) Index: linux-2.6/kernel/power/main.c =================================================================== --- linux-2.6.orig/kernel/power/main.c +++ linux-2.6/kernel/power/main.c @@ -204,6 +204,28 @@ static ssize_t state_store(struct kobjec power_attr(state); +static ssize_t wakeup_count_show(struct kobject *kobj, + struct kobj_attribute *attr, + char *buf) +{ + return sprintf(buf, "%lu\n", pm_get_wakeup_count()); +} + +static ssize_t wakeup_count_store(struct kobject *kobj, + struct kobj_attribute *attr, + const char *buf, size_t n) +{ + unsigned long val; + + if (sscanf(buf, "%lu", &val) == 1) { + if (pm_save_wakeup_count(val)) + return n; + } + return -EINVAL; +} + +power_attr(wakeup_count); + #ifdef CONFIG_PM_TRACE int pm_trace_enabled; @@ -236,6 +258,7 @@ static struct attribute * g[] = { #endif #ifdef CONFIG_PM_SLEEP &pm_async_attr.attr, + &wakeup_count_attr.attr, #ifdef CONFIG_PM_DEBUG &pm_test_attr.attr, #endif @@ -266,6 +289,7 @@ static int __init pm_init(void) int error = pm_start_workqueue(); if (error) return error; + pm_wakeup_events_init(); power_kobj = kobject_create_and_add("power", NULL); if (!power_kobj) return -ENOMEM; Index: linux-2.6/drivers/base/power/wakeup.c =================================================================== --- /dev/null +++ linux-2.6/drivers/base/power/wakeup.c @@ -0,0 +1,150 @@ + +#include <linux/device.h> +#include <linux/slab.h> +#include <linux/sched.h> +#include <linux/pm.h> + +static unsigned long event_count; +static unsigned long saved_event_count; +static unsigned long events_in_progress; +static bool events_check_enabled; +static spinlock_t events_lock; +static DECLARE_WAIT_QUEUE_HEAD(events_wait_queue); + +void pm_wakeup_events_init(void) +{ + spin_lock_init(&events_lock); +} + +void pm_stay_awake(struct device *dev) +{ + unsigned long flags; + + spin_lock_irqsave(&events_lock, flags); + if (dev) + dev->power.wakeup_count++; + + events_in_progress++; + spin_unlock_irqrestore(&events_lock, flags); +} + +void pm_relax(void) +{ + unsigned long flags; + + spin_lock_irqsave(&events_lock, flags); + if (events_in_progress) { + events_in_progress--; + event_count++; + } + spin_unlock_irqrestore(&events_lock, flags); +} + +static void pm_wakeup_work_fn(struct work_struct *work) +{ + struct delayed_work *dwork = to_delayed_work(work); + + pm_relax(); + kfree(dwork); +} + +void pm_wakeup_event(struct device *dev, unsigned int msec) +{ + unsigned long flags; + struct delayed_work *dwork; + + dwork = msec ? kzalloc(sizeof(*dwork), GFP_ATOMIC) : NULL; + + spin_lock_irqsave(&events_lock, flags); + if (dev) + dev->power.wakeup_count++; + + if (dwork) { + INIT_DELAYED_WORK(dwork, pm_wakeup_work_fn); + schedule_delayed_work(dwork, msecs_to_jiffies(msec)); + + events_in_progress++; + } else { + event_count++; + } + spin_unlock_irqrestore(&events_lock, flags); +} + +static bool check_wakeup_events(void) +{ + return !events_check_enabled + || ((event_count == saved_event_count) && !events_in_progress); +} + +bool pm_check_wakeup_events(void) +{ + unsigned long flags; + bool ret; + + spin_lock_irqsave(&events_lock, flags); + ret = check_wakeup_events(); + events_check_enabled = events_check_enabled && ret; + spin_unlock_irqrestore(&events_lock, flags); + return ret; +} + +bool pm_check_wakeup_events_final(void) +{ + bool ret; + + ret = check_wakeup_events(); + events_check_enabled = false; + return ret; +} + +unsigned long pm_dev_wakeup_count(struct device *dev) +{ + unsigned long flags; + unsigned long count; + + spin_lock_irqsave(&events_lock, flags); + count = dev->power.wakeup_count; + spin_unlock_irqrestore(&events_lock, flags); + return count; +} + +unsigned long pm_get_wakeup_count(void) +{ + unsigned long count; + + spin_lock_irq(&events_lock); + events_check_enabled = false; + if (events_in_progress) { + DEFINE_WAIT(wait); + + for (;;) { + prepare_to_wait(&events_wait_queue, &wait, + TASK_INTERRUPTIBLE); + if (!events_in_progress) + break; + spin_unlock_irq(&events_lock); + + schedule(); + + spin_lock_irq(&events_lock); + } + finish_wait(&events_wait_queue, &wait); + } + count = event_count; + spin_unlock_irq(&events_lock); + return count; +} + +bool pm_save_wakeup_count(unsigned long count) +{ + bool ret = false; + + spin_lock_irq(&events_lock); + if (count == event_count && !events_in_progress) { + saved_event_count = count; + events_check_enabled = true; + ret = true; + } + spin_unlock_irq(&events_lock); + return ret; +} Index: linux-2.6/include/linux/pm.h =================================================================== --- linux-2.6.orig/include/linux/pm.h +++ linux-2.6/include/linux/pm.h @@ -457,6 +457,7 @@ struct dev_pm_info { #ifdef CONFIG_PM_SLEEP struct list_head entry; struct completion completion; + unsigned long wakeup_count; #endif #ifdef CONFIG_PM_RUNTIME struct timer_list suspend_timer; @@ -552,6 +553,11 @@ extern void __suspend_report_result(cons } while (0) extern void device_pm_wait_for_dev(struct device *sub, struct device *dev); + +/* drivers/base/power/wakeup.c */ +extern void pm_wakeup_event(struct device *dev, unsigned int msec); +extern void pm_stay_awake(struct device *dev); +extern void pm_relax(void); #else /* !CONFIG_PM_SLEEP */ #define device_pm_lock() do {} while (0) @@ -565,6 +571,10 @@ static inline int dpm_suspend_start(pm_m #define suspend_report_result(fn, ret) do {} while (0) static inline void device_pm_wait_for_dev(struct device *a, struct device *b) {} + +static inline void pm_wakeup_event(struct device *dev, unsigned int msec) {} +static inline void pm_stay_awake(struct device *dev) {} +static inline void pm_relax(void) {} #endif /* !CONFIG_PM_SLEEP */ /* How to reorder dpm_list after device_move() */ Index: linux-2.6/drivers/base/power/Makefile =================================================================== --- linux-2.6.orig/drivers/base/power/Makefile +++ linux-2.6/drivers/base/power/Makefile @@ -1,5 +1,5 @@ obj-$(CONFIG_PM) += sysfs.o -obj-$(CONFIG_PM_SLEEP) += main.o +obj-$(CONFIG_PM_SLEEP) += main.o wakeup.o obj-$(CONFIG_PM_RUNTIME) += runtime.o obj-$(CONFIG_PM_OPS) += generic_ops.o obj-$(CONFIG_PM_TRACE_RTC) += trace.o Index: linux-2.6/drivers/base/power/main.c =================================================================== --- linux-2.6.orig/drivers/base/power/main.c +++ linux-2.6/drivers/base/power/main.c @@ -59,6 +59,7 @@ void device_pm_init(struct device *dev) { dev->power.status = DPM_ON; init_completion(&dev->power.completion); + dev->power.wakeup_count = 0; pm_runtime_init(dev); } Index: linux-2.6/kernel/power/power.h =================================================================== --- linux-2.6.orig/kernel/power/power.h +++ linux-2.6/kernel/power/power.h @@ -184,6 +184,13 @@ static inline void suspend_test_finish(c #ifdef CONFIG_PM_SLEEP /* kernel/power/main.c */ extern int pm_notifier_call_chain(unsigned long val); + +/* drivers/base/power/wakeup.c */ +extern void pm_wakeup_events_init(void); +extern bool pm_check_wakeup_events(void); +extern bool pm_check_wakeup_events_final(void); +extern unsigned long pm_get_wakeup_count(void); +extern bool pm_save_wakeup_count(unsigned long count); #endif #ifdef CONFIG_HIGHMEM Index: linux-2.6/kernel/power/suspend.c =================================================================== --- linux-2.6.orig/kernel/power/suspend.c +++ linux-2.6/kernel/power/suspend.c @@ -172,7 +172,7 @@ static int suspend_enter(suspend_state_t error = sysdev_suspend(PMSG_SUSPEND); if (!error) { - if (!suspend_test(TEST_CORE)) + if (!suspend_test(TEST_CORE) && pm_check_wakeup_events_final()) error = suspend_ops->enter(state); sysdev_resume(); } Index: linux-2.6/kernel/power/hibernate.c =================================================================== --- linux-2.6.orig/kernel/power/hibernate.c +++ linux-2.6/kernel/power/hibernate.c @@ -277,7 +277,7 @@ static int create_image(int platform_mod goto Enable_irqs; } - if (hibernation_test(TEST_CORE)) + if (hibernation_test(TEST_CORE) || !pm_check_wakeup_events()) goto Power_up; in_suspend = 1; @@ -511,18 +511,24 @@ int hibernation_platform_enter(void) local_irq_disable(); sysdev_suspend(PMSG_HIBERNATE); + if (!pm_check_wakeup_events_final()) { + error = -EAGAIN; + goto Power_up; + } + hibernation_ops->enter(); /* We should never get here */ while (1); - /* - * We don't need to reenable the nonboot CPUs or resume consoles, since - * the system is going to be halted anyway. - */ + Power_up: + sysdev_resume(); + local_irq_enable(); + enable_nonboot_cpus(); + Platform_finish: hibernation_ops->finish(); - dpm_suspend_noirq(PMSG_RESTORE); + dpm_resume_noirq(PMSG_RESTORE); Resume_devices: entering_platform_hibernation = false; Index: linux-2.6/drivers/pci/pci-acpi.c =================================================================== --- linux-2.6.orig/drivers/pci/pci-acpi.c +++ linux-2.6/drivers/pci/pci-acpi.c @@ -48,6 +48,8 @@ static void pci_acpi_wake_dev(acpi_handl if (event == ACPI_NOTIFY_DEVICE_WAKE && pci_dev) { pci_check_pme_status(pci_dev); pm_runtime_resume(&pci_dev->dev); + if (device_may_wakeup(&pci_dev->dev)) + pm_wakeup_event(&pci_dev->dev, PCI_WAKEUP_COOLDOWN); if (pci_dev->subordinate) pci_pme_wakeup_bus(pci_dev->subordinate); } Index: linux-2.6/drivers/pci/pcie/pme/pcie_pme.c =================================================================== --- linux-2.6.orig/drivers/pci/pcie/pme/pcie_pme.c +++ linux-2.6/drivers/pci/pcie/pme/pcie_pme.c @@ -154,6 +154,8 @@ static bool pcie_pme_walk_bus(struct pci /* Skip PCIe devices in case we started from a root port. */ if (!pci_is_pcie(dev) && pci_check_pme_status(dev)) { pm_request_resume(&dev->dev); + if (device_may_wakeup(&dev->dev)) + pm_wakeup_event(&dev->dev, PCI_WAKEUP_COOLDOWN); ret = true; } @@ -254,8 +256,11 @@ static void pcie_pme_handle_request(stru if (found) { /* The device is there, but we have to check its PME status. */ found = pci_check_pme_status(dev); - if (found) + if (found) { pm_request_resume(&dev->dev); + if (device_may_wakeup(&dev->dev)) + pm_wakeup_event(&dev->dev, PCI_WAKEUP_COOLDOWN); + } pci_dev_put(dev); } else if (devfn) { /* Index: linux-2.6/drivers/base/power/power.h =================================================================== --- linux-2.6.orig/drivers/base/power/power.h +++ linux-2.6/drivers/base/power/power.h @@ -30,6 +30,9 @@ extern void device_pm_move_before(struct extern void device_pm_move_after(struct device *, struct device *); extern void device_pm_move_last(struct device *); +/* drivers/base/power/wakeup.c */ +extern unsigned long pm_dev_wakeup_count(struct device *dev); + #else /* !CONFIG_PM_SLEEP */ static inline void device_pm_init(struct device *dev) Index: linux-2.6/drivers/base/power/sysfs.c =================================================================== --- linux-2.6.orig/drivers/base/power/sysfs.c +++ linux-2.6/drivers/base/power/sysfs.c @@ -144,6 +144,14 @@ wake_store(struct device * dev, struct d static DEVICE_ATTR(wakeup, 0644, wake_show, wake_store); +static ssize_t wakeup_count_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + return sprintf(buf, "%lu\n", pm_dev_wakeup_count(dev)); +} + +static DEVICE_ATTR(wakeup_count, 0444, wakeup_count_show, NULL); + #ifdef CONFIG_PM_ADVANCED_DEBUG #ifdef CONFIG_PM_RUNTIME @@ -230,6 +238,7 @@ static struct attribute * power_attrs[] &dev_attr_control.attr, #endif &dev_attr_wakeup.attr, + &dev_attr_wakeup_count.attr, #ifdef CONFIG_PM_ADVANCED_DEBUG &dev_attr_async.attr, #ifdef CONFIG_PM_RUNTIME Index: linux-2.6/drivers/pci/pci.h =================================================================== --- linux-2.6.orig/drivers/pci/pci.h +++ linux-2.6/drivers/pci/pci.h @@ -6,6 +6,8 @@ #define PCI_CFG_SPACE_SIZE 256 #define PCI_CFG_SPACE_EXP_SIZE 4096 +#define PCI_WAKEUP_COOLDOWN 100 + /* Functions internal to the PCI core code */ extern int pci_uevent(struct device *dev, struct kobj_uevent_env *env); Index: linux-2.6/drivers/pci/pci.c =================================================================== --- linux-2.6.orig/drivers/pci/pci.c +++ linux-2.6/drivers/pci/pci.c @@ -1285,8 +1285,11 @@ bool pci_check_pme_status(struct pci_dev */ static int pci_pme_wakeup(struct pci_dev *dev, void *ign) { - if (pci_check_pme_status(dev)) + if (pci_check_pme_status(dev)) { pm_request_resume(&dev->dev); + if (device_may_wakeup(&dev->dev)) + pm_wakeup_event(&dev->dev, PCI_WAKEUP_COOLDOWN); + } return 0; } ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [update 2] Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend 2010-06-24 13:13 ` [update 2] " Rafael J. Wysocki @ 2010-06-24 15:06 ` Rafael J. Wysocki 2010-06-24 15:35 ` Alan Stern 2010-06-24 15:44 ` [update 2] " Alan Stern 1 sibling, 1 reply; 75+ messages in thread From: Rafael J. Wysocki @ 2010-06-24 15:06 UTC (permalink / raw) To: Alan Stern Cc: Florian Mickler, Linux-pm mailing list, Matthew Garrett, Linux Kernel Mailing List, Dmitry Torokhov, Arve Hjønnevåg, Neil Brown, mark gross On Thursday, June 24, 2010, Rafael J. Wysocki wrote: > On Thursday, June 24, 2010, Rafael J. Wysocki wrote: > > On Wednesday, June 23, 2010, Alan Stern wrote: > > > On Wed, 23 Jun 2010, Rafael J. Wysocki wrote: > > > > > > > > Didn't we agree that timeouts would be needed for Wake-on-LAN? > > > > > > > > Yes, we did, but in the WoL case the timeout will have to be used by the user > > > > space rather than the kernel IMO. > > > > > > The kernel will still have to specify some small initial timeout. Just > > > long enough for userspace to realize what has happened and start its > > > own critical section. > > > > > > > It would make sense to add a timeout argument to pm_wakeup_event() that would > > > > make the system stay in the working state long enough for the driver wakeup > > > > code to start in the PCIe case. I think pm_wakeup_event() mght just increment > > > > events_in_progress and the timer might simply decrement it. > > > > > > Hmm. I was thinking about a similar problem with the USB hub driver. > > > > > > Maybe a better answer for this particular issue is to change the > > > workqueue code. Don't allow a work thread to enter the freezer until > > > its queue is empty. Then you wouldn't need a timeout. > > > > > > > So, maybe it's just better to have pm_wakeup_event(dev, timeout) that will > > > > increment events_in_progress and set up a timer and pm_wakeup_commit(dev) that > > > > will delete the timer, decrement events_in_progress and increment event_count > > > > (unless the timer has already expired before). > > > > > > > > That would cost us a (one more) timer in struct dev_pm_info, but it would > > > > allow us to cover all of the cases identified so far. So, if a wakeup event is > > > > handled within one functional unit that both detects it and delivers it to > > > > user space, it would call pm_wakeup_event(dev, 0) (ie. infinite timeout) at the > > > > beginning and then pm_wakeup_commit(dev) when it's done with the event. > > > > If a wakeup event it's just detected by one piece of code and is going to > > > > be handled by another, the first one could call pm_wakeup_event(dev, tm) and > > > > allow the other one to call pm_wakeup_commit(dev) when it's done. However, > > > > calling pm_wakeup_commit(dev) is not mandatory, so the second piece of code > > > > (eg. a PCI driver) doesn't really need to do anything in the simplest case. > > > > > > You have to handle the case where pm_wakeup_commit() gets called after > > > the timer expires (it should do nothing). > > > > Yup. > > > > > And what happens if the device gets a second wakeup event before the timer > > > for the first one expires? > > > > Good question. I don't have an answer to it at the moment, but it seems to > > arise from using a single timer for all events. > > > > It looks like it's simpler to make pm_wakeup_event() allocate a timer for each > > event and make the timer function remove it. That would cause suspend to > > be blocked until the timer expires without a way to cancel it earlier, though. > > So, I decided to try this after all. > > Below is a new version of the patch. It introduces pm_stay_awake(dev) and > pm_relax() that play the roles of the "old" pm_wakeup_begin() and > pm_wakeup_end(). > > pm_wakeup_event() now takes an extra timeout argument and uses it for > deferred execution of pm_relax(). So, one can either use the > pm_stay_awake(dev) / pm_relax() pair, or use pm_wakeup_event(dev, timeout) > if the ending is under someone else's control. > > In addition to that, pm_get_wakeup_count() blocks until events_in_progress is > zero. > > Please tell me what you think. Ah, one piece is missing. Namely, the waiting /sys/power/wakeup_count reader needs to be woken up when events_in_progress goes down to zero. I'll send a new version with this bug fixed later today. Rafael ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [update 2] Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend 2010-06-24 15:06 ` Rafael J. Wysocki @ 2010-06-24 15:35 ` Alan Stern 2010-06-24 23:00 ` [update 3] " Rafael J. Wysocki 0 siblings, 1 reply; 75+ messages in thread From: Alan Stern @ 2010-06-24 15:35 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Florian Mickler, Linux-pm mailing list, Matthew Garrett, Linux Kernel Mailing List, Dmitry Torokhov, Arve Hjønnevåg, Neil Brown, mark gross On Thu, 24 Jun 2010, Rafael J. Wysocki wrote: > Ah, one piece is missing. Namely, the waiting /sys/power/wakeup_count reader > needs to be woken up when events_in_progress goes down to zero. It also needs to abort immediately if a signal is received. Alan Stern ^ permalink raw reply [flat|nested] 75+ messages in thread
* [update 3] Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend 2010-06-24 15:35 ` Alan Stern @ 2010-06-24 23:00 ` Rafael J. Wysocki 2010-06-25 14:42 ` Alan Stern 0 siblings, 1 reply; 75+ messages in thread From: Rafael J. Wysocki @ 2010-06-24 23:00 UTC (permalink / raw) To: Alan Stern Cc: Florian Mickler, Linux-pm mailing list, Matthew Garrett, Linux Kernel Mailing List, Dmitry Torokhov, Arve Hjønnevåg, Neil Brown, mark gross On Thursday, June 24, 2010, Alan Stern wrote: > On Thu, 24 Jun 2010, Rafael J. Wysocki wrote: > > > Ah, one piece is missing. Namely, the waiting /sys/power/wakeup_count reader > > needs to be woken up when events_in_progress goes down to zero. > > It also needs to abort immediately if a signal is received. Right. So, there it goes. I decided not to play with memory allocations at this point, because I really don't expect pm_wakeup_event() to be heavily used initially. If there are more users and it's called more frequently, we can always switch to using a separate slab cache. Hopefully, I haven't overlooked anything vitally important this time. Please tell me what you think. Rafael --- drivers/base/power/Makefile | 2 drivers/base/power/main.c | 1 drivers/base/power/power.h | 3 drivers/base/power/sysfs.c | 9 ++ drivers/base/power/wakeup.c | 143 ++++++++++++++++++++++++++++++++++++++++ drivers/pci/pci-acpi.c | 1 drivers/pci/pci.c | 10 ++ drivers/pci/pci.h | 3 drivers/pci/pcie/pme/pcie_pme.c | 5 + include/linux/pm.h | 10 ++ kernel/power/hibernate.c | 22 ++++-- kernel/power/main.c | 26 +++++++ kernel/power/power.h | 9 ++ kernel/power/suspend.c | 4 - 14 files changed, 237 insertions(+), 11 deletions(-) Index: linux-2.6/kernel/power/main.c =================================================================== --- linux-2.6.orig/kernel/power/main.c +++ linux-2.6/kernel/power/main.c @@ -204,6 +204,30 @@ static ssize_t state_store(struct kobjec power_attr(state); +static ssize_t wakeup_count_show(struct kobject *kobj, + struct kobj_attribute *attr, + char *buf) +{ + unsigned long val; + + return pm_get_wakeup_count(&val) ? sprintf(buf, "%lu\n", val) : -EINTR; +} + +static ssize_t wakeup_count_store(struct kobject *kobj, + struct kobj_attribute *attr, + const char *buf, size_t n) +{ + unsigned long val; + + if (sscanf(buf, "%lu", &val) == 1) { + if (pm_save_wakeup_count(val)) + return n; + } + return -EINVAL; +} + +power_attr(wakeup_count); + #ifdef CONFIG_PM_TRACE int pm_trace_enabled; @@ -236,6 +260,7 @@ static struct attribute * g[] = { #endif #ifdef CONFIG_PM_SLEEP &pm_async_attr.attr, + &wakeup_count_attr.attr, #ifdef CONFIG_PM_DEBUG &pm_test_attr.attr, #endif @@ -266,6 +291,7 @@ static int __init pm_init(void) int error = pm_start_workqueue(); if (error) return error; + pm_wakeup_events_init(); power_kobj = kobject_create_and_add("power", NULL); if (!power_kobj) return -ENOMEM; Index: linux-2.6/drivers/base/power/wakeup.c =================================================================== --- /dev/null +++ linux-2.6/drivers/base/power/wakeup.c @@ -0,0 +1,143 @@ + +#include <linux/device.h> +#include <linux/slab.h> +#include <linux/sched.h> +#include <linux/capability.h> +#include <linux/pm.h> + +bool events_check_enabled; + +static unsigned long event_count; +static unsigned long saved_event_count; +static unsigned long events_in_progress; +static spinlock_t events_lock; +static DECLARE_WAIT_QUEUE_HEAD(events_wait_queue); + +void pm_wakeup_events_init(void) +{ + spin_lock_init(&events_lock); +} + +void pm_stay_awake(struct device *dev) +{ + unsigned long flags; + + spin_lock_irqsave(&events_lock, flags); + if (dev) + dev->power.wakeup_count++; + + events_in_progress++; + spin_unlock_irqrestore(&events_lock, flags); +} + +void pm_relax(void) +{ + unsigned long flags; + + spin_lock_irqsave(&events_lock, flags); + if (events_in_progress) { + event_count++; + if (!--events_in_progress) + wake_up_all(&events_wait_queue); + } + spin_unlock_irqrestore(&events_lock, flags); +} + +static void pm_wakeup_work_fn(struct work_struct *work) +{ + struct delayed_work *dwork = to_delayed_work(work); + + pm_relax(); + kfree(dwork); +} + +void pm_wakeup_event(struct device *dev, unsigned int msec) +{ + unsigned long flags; + struct delayed_work *dwork; + + dwork = msec ? kzalloc(sizeof(*dwork), GFP_ATOMIC) : NULL; + + spin_lock_irqsave(&events_lock, flags); + if (dev) + dev->power.wakeup_count++; + + if (dwork) { + INIT_DELAYED_WORK(dwork, pm_wakeup_work_fn); + schedule_delayed_work(dwork, msecs_to_jiffies(msec)); + + events_in_progress++; + } else { + event_count++; + } + spin_unlock_irqrestore(&events_lock, flags); +} + +bool pm_check_wakeup_events(void) +{ + unsigned long flags; + bool ret = true; + + spin_lock_irqsave(&events_lock, flags); + if (events_check_enabled) { + ret = (event_count == saved_event_count) && !events_in_progress; + events_check_enabled = ret; + } + spin_unlock_irqrestore(&events_lock, flags); + return ret; +} + +unsigned long pm_dev_wakeup_count(struct device *dev) +{ + unsigned long flags; + unsigned long count; + + spin_lock_irqsave(&events_lock, flags); + count = dev->power.wakeup_count; + spin_unlock_irqrestore(&events_lock, flags); + return count; +} + +bool pm_get_wakeup_count(unsigned long *count) +{ + bool ret; + + spin_lock_irq(&events_lock); + if (capable(CAP_SYS_ADMIN)) + events_check_enabled = false; + + if (events_in_progress) { + DEFINE_WAIT(wait); + + do { + prepare_to_wait(&events_wait_queue, &wait, + TASK_INTERRUPTIBLE); + if (!events_in_progress) + break; + spin_unlock_irq(&events_lock); + + schedule(); + + spin_lock_irq(&events_lock); + } while (!signal_pending(current)); + finish_wait(&events_wait_queue, &wait); + } + *count = event_count; + ret = !events_in_progress; + spin_unlock_irq(&events_lock); + return ret; +} + +bool pm_save_wakeup_count(unsigned long count) +{ + bool ret = false; + + spin_lock_irq(&events_lock); + if (count == event_count && !events_in_progress) { + saved_event_count = count; + events_check_enabled = true; + ret = true; + } + spin_unlock_irq(&events_lock); + return ret; +} Index: linux-2.6/include/linux/pm.h =================================================================== --- linux-2.6.orig/include/linux/pm.h +++ linux-2.6/include/linux/pm.h @@ -457,6 +457,7 @@ struct dev_pm_info { #ifdef CONFIG_PM_SLEEP struct list_head entry; struct completion completion; + unsigned long wakeup_count; #endif #ifdef CONFIG_PM_RUNTIME struct timer_list suspend_timer; @@ -552,6 +553,11 @@ extern void __suspend_report_result(cons } while (0) extern void device_pm_wait_for_dev(struct device *sub, struct device *dev); + +/* drivers/base/power/wakeup.c */ +extern void pm_wakeup_event(struct device *dev, unsigned int msec); +extern void pm_stay_awake(struct device *dev); +extern void pm_relax(void); #else /* !CONFIG_PM_SLEEP */ #define device_pm_lock() do {} while (0) @@ -565,6 +571,10 @@ static inline int dpm_suspend_start(pm_m #define suspend_report_result(fn, ret) do {} while (0) static inline void device_pm_wait_for_dev(struct device *a, struct device *b) {} + +static inline void pm_wakeup_event(struct device *dev, unsigned int msec) {} +static inline void pm_stay_awake(struct device *dev) {} +static inline void pm_relax(void) {} #endif /* !CONFIG_PM_SLEEP */ /* How to reorder dpm_list after device_move() */ Index: linux-2.6/drivers/base/power/Makefile =================================================================== --- linux-2.6.orig/drivers/base/power/Makefile +++ linux-2.6/drivers/base/power/Makefile @@ -1,5 +1,5 @@ obj-$(CONFIG_PM) += sysfs.o -obj-$(CONFIG_PM_SLEEP) += main.o +obj-$(CONFIG_PM_SLEEP) += main.o wakeup.o obj-$(CONFIG_PM_RUNTIME) += runtime.o obj-$(CONFIG_PM_OPS) += generic_ops.o obj-$(CONFIG_PM_TRACE_RTC) += trace.o Index: linux-2.6/drivers/base/power/main.c =================================================================== --- linux-2.6.orig/drivers/base/power/main.c +++ linux-2.6/drivers/base/power/main.c @@ -59,6 +59,7 @@ void device_pm_init(struct device *dev) { dev->power.status = DPM_ON; init_completion(&dev->power.completion); + dev->power.wakeup_count = 0; pm_runtime_init(dev); } Index: linux-2.6/kernel/power/power.h =================================================================== --- linux-2.6.orig/kernel/power/power.h +++ linux-2.6/kernel/power/power.h @@ -184,6 +184,15 @@ static inline void suspend_test_finish(c #ifdef CONFIG_PM_SLEEP /* kernel/power/main.c */ extern int pm_notifier_call_chain(unsigned long val); + +/* drivers/base/power/wakeup.c */ +extern bool events_check_enabled; + +extern void pm_wakeup_events_init(void); +extern bool pm_check_wakeup_events(void); +extern bool pm_check_wakeup_events_final(void); +extern bool pm_get_wakeup_count(unsigned long *count); +extern bool pm_save_wakeup_count(unsigned long count); #endif #ifdef CONFIG_HIGHMEM Index: linux-2.6/kernel/power/suspend.c =================================================================== --- linux-2.6.orig/kernel/power/suspend.c +++ linux-2.6/kernel/power/suspend.c @@ -172,8 +172,10 @@ static int suspend_enter(suspend_state_t error = sysdev_suspend(PMSG_SUSPEND); if (!error) { - if (!suspend_test(TEST_CORE)) + if (!suspend_test(TEST_CORE) && pm_check_wakeup_events()) { error = suspend_ops->enter(state); + events_check_enabled = false; + } sysdev_resume(); } Index: linux-2.6/kernel/power/hibernate.c =================================================================== --- linux-2.6.orig/kernel/power/hibernate.c +++ linux-2.6/kernel/power/hibernate.c @@ -277,7 +277,7 @@ static int create_image(int platform_mod goto Enable_irqs; } - if (hibernation_test(TEST_CORE)) + if (hibernation_test(TEST_CORE) || !pm_check_wakeup_events()) goto Power_up; in_suspend = 1; @@ -288,8 +288,10 @@ static int create_image(int platform_mod error); /* Restore control flow magically appears here */ restore_processor_state(); - if (!in_suspend) + if (!in_suspend) { + events_check_enabled = false; platform_leave(platform_mode); + } Power_up: sysdev_resume(); @@ -511,18 +513,24 @@ int hibernation_platform_enter(void) local_irq_disable(); sysdev_suspend(PMSG_HIBERNATE); + if (!pm_check_wakeup_events()) { + error = -EAGAIN; + goto Power_up; + } + hibernation_ops->enter(); /* We should never get here */ while (1); - /* - * We don't need to reenable the nonboot CPUs or resume consoles, since - * the system is going to be halted anyway. - */ + Power_up: + sysdev_resume(); + local_irq_enable(); + enable_nonboot_cpus(); + Platform_finish: hibernation_ops->finish(); - dpm_suspend_noirq(PMSG_RESTORE); + dpm_resume_noirq(PMSG_RESTORE); Resume_devices: entering_platform_hibernation = false; Index: linux-2.6/drivers/pci/pci-acpi.c =================================================================== --- linux-2.6.orig/drivers/pci/pci-acpi.c +++ linux-2.6/drivers/pci/pci-acpi.c @@ -48,6 +48,7 @@ static void pci_acpi_wake_dev(acpi_handl if (event == ACPI_NOTIFY_DEVICE_WAKE && pci_dev) { pci_check_pme_status(pci_dev); pm_runtime_resume(&pci_dev->dev); + pci_wakeup_event(pci_dev); if (pci_dev->subordinate) pci_pme_wakeup_bus(pci_dev->subordinate); } Index: linux-2.6/drivers/pci/pcie/pme/pcie_pme.c =================================================================== --- linux-2.6.orig/drivers/pci/pcie/pme/pcie_pme.c +++ linux-2.6/drivers/pci/pcie/pme/pcie_pme.c @@ -154,6 +154,7 @@ static bool pcie_pme_walk_bus(struct pci /* Skip PCIe devices in case we started from a root port. */ if (!pci_is_pcie(dev) && pci_check_pme_status(dev)) { pm_request_resume(&dev->dev); + pci_wakeup_event(dev); ret = true; } @@ -254,8 +255,10 @@ static void pcie_pme_handle_request(stru if (found) { /* The device is there, but we have to check its PME status. */ found = pci_check_pme_status(dev); - if (found) + if (found) { pm_request_resume(&dev->dev); + pci_wakeup_event(dev); + } pci_dev_put(dev); } else if (devfn) { /* Index: linux-2.6/drivers/base/power/power.h =================================================================== --- linux-2.6.orig/drivers/base/power/power.h +++ linux-2.6/drivers/base/power/power.h @@ -30,6 +30,9 @@ extern void device_pm_move_before(struct extern void device_pm_move_after(struct device *, struct device *); extern void device_pm_move_last(struct device *); +/* drivers/base/power/wakeup.c */ +extern unsigned long pm_dev_wakeup_count(struct device *dev); + #else /* !CONFIG_PM_SLEEP */ static inline void device_pm_init(struct device *dev) Index: linux-2.6/drivers/base/power/sysfs.c =================================================================== --- linux-2.6.orig/drivers/base/power/sysfs.c +++ linux-2.6/drivers/base/power/sysfs.c @@ -144,6 +144,14 @@ wake_store(struct device * dev, struct d static DEVICE_ATTR(wakeup, 0644, wake_show, wake_store); +static ssize_t wakeup_count_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + return sprintf(buf, "%lu\n", pm_dev_wakeup_count(dev)); +} + +static DEVICE_ATTR(wakeup_count, 0444, wakeup_count_show, NULL); + #ifdef CONFIG_PM_ADVANCED_DEBUG #ifdef CONFIG_PM_RUNTIME @@ -230,6 +238,7 @@ static struct attribute * power_attrs[] &dev_attr_control.attr, #endif &dev_attr_wakeup.attr, + &dev_attr_wakeup_count.attr, #ifdef CONFIG_PM_ADVANCED_DEBUG &dev_attr_async.attr, #ifdef CONFIG_PM_RUNTIME Index: linux-2.6/drivers/pci/pci.h =================================================================== --- linux-2.6.orig/drivers/pci/pci.h +++ linux-2.6/drivers/pci/pci.h @@ -6,6 +6,8 @@ #define PCI_CFG_SPACE_SIZE 256 #define PCI_CFG_SPACE_EXP_SIZE 4096 +#define PCI_WAKEUP_COOLDOWN 100 + /* Functions internal to the PCI core code */ extern int pci_uevent(struct device *dev, struct kobj_uevent_env *env); @@ -56,6 +58,7 @@ extern void pci_update_current_state(str extern void pci_disable_enabled_device(struct pci_dev *dev); extern bool pci_check_pme_status(struct pci_dev *dev); extern int pci_finish_runtime_suspend(struct pci_dev *dev); +extern void pci_wakeup_event(struct pci_dev *dev); extern int __pci_pme_wakeup(struct pci_dev *dev, void *ign); extern void pci_pme_wakeup_bus(struct pci_bus *bus); extern void pci_pm_init(struct pci_dev *dev); Index: linux-2.6/drivers/pci/pci.c =================================================================== --- linux-2.6.orig/drivers/pci/pci.c +++ linux-2.6/drivers/pci/pci.c @@ -1275,6 +1275,12 @@ bool pci_check_pme_status(struct pci_dev return ret; } +void pci_wakeup_event(struct pci_dev *dev) +{ + if (device_may_wakeup(&dev->dev)) + pm_wakeup_event(&dev->dev, PCI_WAKEUP_COOLDOWN); +} + /** * pci_pme_wakeup - Wake up a PCI device if its PME Status bit is set. * @dev: Device to handle. @@ -1285,8 +1291,10 @@ bool pci_check_pme_status(struct pci_dev */ static int pci_pme_wakeup(struct pci_dev *dev, void *ign) { - if (pci_check_pme_status(dev)) + if (pci_check_pme_status(dev)) { pm_request_resume(&dev->dev); + pci_wakeup_event(dev); + } return 0; } ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [update 3] Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend 2010-06-24 23:00 ` [update 3] " Rafael J. Wysocki @ 2010-06-25 14:42 ` Alan Stern 2010-06-25 20:33 ` Rafael J. Wysocki 0 siblings, 1 reply; 75+ messages in thread From: Alan Stern @ 2010-06-25 14:42 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Florian Mickler, Linux-pm mailing list, Matthew Garrett, Linux Kernel Mailing List, Dmitry Torokhov, Arve Hjønnevåg, Neil Brown, mark gross On Fri, 25 Jun 2010, Rafael J. Wysocki wrote: > So, there it goes. > > I decided not to play with memory allocations at this point, because I really > don't expect pm_wakeup_event() to be heavily used initially. If there are more > users and it's called more frequently, we can always switch to using a separate > slab cache. > > Hopefully, I haven't overlooked anything vitally important this time. > > Please tell me what you think. Obviously comments still need to be added. Beyond that... > --- /dev/null > +++ linux-2.6/drivers/base/power/wakeup.c > @@ -0,0 +1,143 @@ > + > +#include <linux/device.h> > +#include <linux/slab.h> > +#include <linux/sched.h> > +#include <linux/capability.h> > +#include <linux/pm.h> > + > +bool events_check_enabled; > + > +static unsigned long event_count; > +static unsigned long saved_event_count; > +static unsigned long events_in_progress; > +static spinlock_t events_lock; Use static DEFINE_SPINLOCK(events_lock) instead. > +static DECLARE_WAIT_QUEUE_HEAD(events_wait_queue); > + > +void pm_wakeup_events_init(void) > +{ > + spin_lock_init(&events_lock); > +} Then this routine won't be needed. > +unsigned long pm_dev_wakeup_count(struct device *dev) > +{ > + unsigned long flags; > + unsigned long count; > + > + spin_lock_irqsave(&events_lock, flags); > + count = dev->power.wakeup_count; > + spin_unlock_irqrestore(&events_lock, flags); > + return count; > +} Are the spin_lock calls needed here? I doubt it. > --- linux-2.6.orig/kernel/power/power.h > +++ linux-2.6/kernel/power/power.h > @@ -184,6 +184,15 @@ static inline void suspend_test_finish(c > #ifdef CONFIG_PM_SLEEP > /* kernel/power/main.c */ > extern int pm_notifier_call_chain(unsigned long val); > + > +/* drivers/base/power/wakeup.c */ > +extern bool events_check_enabled; > + > +extern void pm_wakeup_events_init(void); > +extern bool pm_check_wakeup_events(void); > +extern bool pm_check_wakeup_events_final(void); > +extern bool pm_get_wakeup_count(unsigned long *count); > +extern bool pm_save_wakeup_count(unsigned long count); > #endif This is unfortunate. These declarations belong in a file that can also be #included by drivers/base/power/wakeup.c. Otherwise future changes might cause type mismatches the compiler won't be able to catch. > @@ -511,18 +513,24 @@ int hibernation_platform_enter(void) > > local_irq_disable(); > sysdev_suspend(PMSG_HIBERNATE); > + if (!pm_check_wakeup_events()) { > + error = -EAGAIN; > + goto Power_up; > + } > + > hibernation_ops->enter(); > /* We should never get here */ > while (1); > > - /* > - * We don't need to reenable the nonboot CPUs or resume consoles, since > - * the system is going to be halted anyway. > - */ > + Power_up: > + sysdev_resume(); > + local_irq_enable(); > + enable_nonboot_cpus(); > + > Platform_finish: > hibernation_ops->finish(); > > - dpm_suspend_noirq(PMSG_RESTORE); > + dpm_resume_noirq(PMSG_RESTORE); Is this a bug fix that crept in along with the other changes? > --- linux-2.6.orig/drivers/pci/pci.h > +++ linux-2.6/drivers/pci/pci.h > @@ -6,6 +6,8 @@ > #define PCI_CFG_SPACE_SIZE 256 > #define PCI_CFG_SPACE_EXP_SIZE 4096 > > +#define PCI_WAKEUP_COOLDOWN 100 This definition can go directly in pci.c, since it isn't used anywhere else. Alan Stern ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [update 3] Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend 2010-06-25 14:42 ` Alan Stern @ 2010-06-25 20:33 ` Rafael J. Wysocki 0 siblings, 0 replies; 75+ messages in thread From: Rafael J. Wysocki @ 2010-06-25 20:33 UTC (permalink / raw) To: Alan Stern Cc: Florian Mickler, Linux-pm mailing list, Matthew Garrett, Linux Kernel Mailing List, Dmitry Torokhov, Arve Hjønnevåg, Neil Brown, mark gross On Friday, June 25, 2010, Alan Stern wrote: > On Fri, 25 Jun 2010, Rafael J. Wysocki wrote: > > > So, there it goes. > > > > I decided not to play with memory allocations at this point, because I really > > don't expect pm_wakeup_event() to be heavily used initially. If there are more > > users and it's called more frequently, we can always switch to using a separate > > slab cache. > > > > Hopefully, I haven't overlooked anything vitally important this time. > > > > Please tell me what you think. > > Obviously comments still need to be added. Indeed. > Beyond that... > > > --- /dev/null > > +++ linux-2.6/drivers/base/power/wakeup.c > > @@ -0,0 +1,143 @@ > > + > > +#include <linux/device.h> > > +#include <linux/slab.h> > > +#include <linux/sched.h> > > +#include <linux/capability.h> > > +#include <linux/pm.h> > > + > > +bool events_check_enabled; > > + > > +static unsigned long event_count; > > +static unsigned long saved_event_count; > > +static unsigned long events_in_progress; > > +static spinlock_t events_lock; > > Use static DEFINE_SPINLOCK(events_lock) instead. Hmm. I thought that was deprecated. Never mind. > > +static DECLARE_WAIT_QUEUE_HEAD(events_wait_queue); > > + > > +void pm_wakeup_events_init(void) > > +{ > > + spin_lock_init(&events_lock); > > +} > > Then this routine won't be needed. > > > +unsigned long pm_dev_wakeup_count(struct device *dev) > > +{ > > + unsigned long flags; > > + unsigned long count; > > + > > + spin_lock_irqsave(&events_lock, flags); > > + count = dev->power.wakeup_count; > > + spin_unlock_irqrestore(&events_lock, flags); > > + return count; > > +} > > Are the spin_lock calls needed here? I doubt it. No, they aren't. In fact it may be a static inline. > > --- linux-2.6.orig/kernel/power/power.h > > +++ linux-2.6/kernel/power/power.h > > @@ -184,6 +184,15 @@ static inline void suspend_test_finish(c > > #ifdef CONFIG_PM_SLEEP > > /* kernel/power/main.c */ > > extern int pm_notifier_call_chain(unsigned long val); > > + > > +/* drivers/base/power/wakeup.c */ > > +extern bool events_check_enabled; > > + > > +extern void pm_wakeup_events_init(void); > > +extern bool pm_check_wakeup_events(void); > > +extern bool pm_check_wakeup_events_final(void); > > +extern bool pm_get_wakeup_count(unsigned long *count); > > +extern bool pm_save_wakeup_count(unsigned long count); > > #endif > > This is unfortunate. These declarations belong in a file that can > also be #included by drivers/base/power/wakeup.c. Otherwise future > changes might cause type mismatches the compiler won't be able to > catch. You're right. In that case I think include/linux/suspend.h is the right header to put them into. > > @@ -511,18 +513,24 @@ int hibernation_platform_enter(void) > > > > local_irq_disable(); > > sysdev_suspend(PMSG_HIBERNATE); > > + if (!pm_check_wakeup_events()) { > > + error = -EAGAIN; > > + goto Power_up; > > + } > > + > > hibernation_ops->enter(); > > /* We should never get here */ > > while (1); > > > > - /* > > - * We don't need to reenable the nonboot CPUs or resume consoles, since > > - * the system is going to be halted anyway. > > - */ > > + Power_up: > > + sysdev_resume(); > > + local_irq_enable(); > > + enable_nonboot_cpus(); > > + > > Platform_finish: > > hibernation_ops->finish(); > > > > - dpm_suspend_noirq(PMSG_RESTORE); > > + dpm_resume_noirq(PMSG_RESTORE); > > Is this a bug fix that crept in along with the other changes? Yeah. > > --- linux-2.6.orig/drivers/pci/pci.h > > +++ linux-2.6/drivers/pci/pci.h > > @@ -6,6 +6,8 @@ > > #define PCI_CFG_SPACE_SIZE 256 > > #define PCI_CFG_SPACE_EXP_SIZE 4096 > > > > +#define PCI_WAKEUP_COOLDOWN 100 > > This definition can go directly in pci.c, since it isn't used anywhere > else. OK Thanks for the comments, Rafael ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [update 2] Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend 2010-06-24 13:13 ` [update 2] " Rafael J. Wysocki 2010-06-24 15:06 ` Rafael J. Wysocki @ 2010-06-24 15:44 ` Alan Stern 2010-06-24 16:19 ` Rafael J. Wysocki 1 sibling, 1 reply; 75+ messages in thread From: Alan Stern @ 2010-06-24 15:44 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Florian Mickler, Linux-pm mailing list, Matthew Garrett, Linux Kernel Mailing List, Dmitry Torokhov, Arve Hjønnevåg, Neil Brown, mark gross On Thu, 24 Jun 2010, Rafael J. Wysocki wrote: > > > And what happens if the device gets a second wakeup event before the timer > > > for the first one expires? > > > > Good question. I don't have an answer to it at the moment, but it seems to > > arise from using a single timer for all events. > > > > It looks like it's simpler to make pm_wakeup_event() allocate a timer for each > > event and make the timer function remove it. That would cause suspend to > > be blocked until the timer expires without a way to cancel it earlier, though. > > So, I decided to try this after all. > > Below is a new version of the patch. It introduces pm_stay_awake(dev) and > pm_relax() that play the roles of the "old" pm_wakeup_begin() and > pm_wakeup_end(). > > pm_wakeup_event() now takes an extra timeout argument and uses it for > deferred execution of pm_relax(). So, one can either use the > pm_stay_awake(dev) / pm_relax() pair, or use pm_wakeup_event(dev, timeout) > if the ending is under someone else's control. > > In addition to that, pm_get_wakeup_count() blocks until events_in_progress is > zero. > > Please tell me what you think. This is slightly different from the wakelock design. Each call to pm_stay_awake() must be paired with a call to pm_relax(), allowing one device to have multiple concurrent critical sections, whereas calls to pm_wakeup_event() must not be paired with anything. With wakelocks, you couldn't have multiple pending events for the same device. I'm not sure which model is better in practice. No doubt the Android people will prefer their way. This requires you to define an explicit PCI_WAKEUP_COOLDOWN delay. I think that's okay; I had to do something similar with USB and SCSI. (And I still think it would be a good idea to prevent workqueue threads from freezing until their queues are empty.) Instead of allocating the work structures dynamically, would you be better off using a memory pool? Alan Stern ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [update 2] Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend 2010-06-24 15:44 ` [update 2] " Alan Stern @ 2010-06-24 16:19 ` Rafael J. Wysocki 2010-06-24 17:09 ` Alan Stern 0 siblings, 1 reply; 75+ messages in thread From: Rafael J. Wysocki @ 2010-06-24 16:19 UTC (permalink / raw) To: Alan Stern Cc: Florian Mickler, Linux-pm mailing list, Matthew Garrett, Linux Kernel Mailing List, Dmitry Torokhov, Arve Hjønnevåg, Neil Brown, mark gross On Thursday, June 24, 2010, Alan Stern wrote: > On Thu, 24 Jun 2010, Rafael J. Wysocki wrote: > > > > > And what happens if the device gets a second wakeup event before the timer > > > > for the first one expires? > > > > > > Good question. I don't have an answer to it at the moment, but it seems to > > > arise from using a single timer for all events. > > > > > > It looks like it's simpler to make pm_wakeup_event() allocate a timer for each > > > event and make the timer function remove it. That would cause suspend to > > > be blocked until the timer expires without a way to cancel it earlier, though. > > > > So, I decided to try this after all. > > > > Below is a new version of the patch. It introduces pm_stay_awake(dev) and > > pm_relax() that play the roles of the "old" pm_wakeup_begin() and > > pm_wakeup_end(). > > > > pm_wakeup_event() now takes an extra timeout argument and uses it for > > deferred execution of pm_relax(). So, one can either use the > > pm_stay_awake(dev) / pm_relax() pair, or use pm_wakeup_event(dev, timeout) > > if the ending is under someone else's control. > > > > In addition to that, pm_get_wakeup_count() blocks until events_in_progress is > > zero. > > > > Please tell me what you think. > > This is slightly different from the wakelock design. Each call to > pm_stay_awake() must be paired with a call to pm_relax(), allowing one > device to have multiple concurrent critical sections, whereas calls to > pm_wakeup_event() must not be paired with anything. With wakelocks, > you couldn't have multiple pending events for the same device. You could, but you needed to define multiple wakelocks for the same device for this purpose. > I'm not sure which model is better in practice. No doubt the Android people > will prefer their way. I suppose so. > This requires you to define an explicit PCI_WAKEUP_COOLDOWN delay. I > think that's okay; I had to do something similar with USB and SCSI. > (And I still think it would be a good idea to prevent workqueue threads > from freezing until their queues are empty.) I guess you mean the freezable ones? I'm not sure if that helps a lot, because new work items may still be added after the workqueue thread has been frozen. > Instead of allocating the work structures dynamically, would you be > better off using a memory pool? Well, it would be kind of equivalent to defining my own slab cache for that, wouldn't it? Rafael ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [update 2] Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend 2010-06-24 16:19 ` Rafael J. Wysocki @ 2010-06-24 17:09 ` Alan Stern 2010-06-24 23:06 ` Rafael J. Wysocki 2010-06-25 6:40 ` Florian Mickler 0 siblings, 2 replies; 75+ messages in thread From: Alan Stern @ 2010-06-24 17:09 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Florian Mickler, Linux-pm mailing list, Matthew Garrett, Linux Kernel Mailing List, Dmitry Torokhov, Arve Hjønnevåg, Neil Brown, mark gross On Thu, 24 Jun 2010, Rafael J. Wysocki wrote: > > This is slightly different from the wakelock design. Each call to > > pm_stay_awake() must be paired with a call to pm_relax(), allowing one > > device to have multiple concurrent critical sections, whereas calls to > > pm_wakeup_event() must not be paired with anything. With wakelocks, > > you couldn't have multiple pending events for the same device. > > You could, but you needed to define multiple wakelocks for the same device for > this purpose. Yeah, okay, but who's going to do that? > > I'm not sure which model is better in practice. No doubt the Android people > > will prefer their way. > > I suppose so. It may not make a significant difference in the end. You can always emulate the wakelock approach by not calling pm_stay_awake() when you know there is an earlier call still pending. > > This requires you to define an explicit PCI_WAKEUP_COOLDOWN delay. I > > think that's okay; I had to do something similar with USB and SCSI. > > (And I still think it would be a good idea to prevent workqueue threads > > from freezing until their queues are empty.) > > I guess you mean the freezable ones? Yes. The unfreezable workqueue threads don't have to worry about getting frozen while their queues are non-empty. :-) > I'm not sure if that helps a lot, because > new work items may still be added after the workqueue thread has been frozen. That's not the point. If a wakeup handler queues a work item (for example, by calling pm_request_resume) then it wouldn't need to guess a timeout. The work item would be guaranteed to run before the system could suspend again. > > Instead of allocating the work structures dynamically, would you be > > better off using a memory pool? > > Well, it would be kind of equivalent to defining my own slab cache for that, > wouldn't it? I suppose so. It would make the GFP_ATOMIC allocations a little more reliable. Alan Stern ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [update 2] Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend 2010-06-24 17:09 ` Alan Stern @ 2010-06-24 23:06 ` Rafael J. Wysocki 2010-06-25 15:09 ` Alan Stern 2010-06-25 6:40 ` Florian Mickler 1 sibling, 1 reply; 75+ messages in thread From: Rafael J. Wysocki @ 2010-06-24 23:06 UTC (permalink / raw) To: Alan Stern Cc: Florian Mickler, Linux-pm mailing list, Matthew Garrett, Linux Kernel Mailing List, Dmitry Torokhov, Arve Hjønnevåg, Neil Brown, mark gross On Thursday, June 24, 2010, Alan Stern wrote: > On Thu, 24 Jun 2010, Rafael J. Wysocki wrote: > > > > This is slightly different from the wakelock design. Each call to > > > pm_stay_awake() must be paired with a call to pm_relax(), allowing one > > > device to have multiple concurrent critical sections, whereas calls to > > > pm_wakeup_event() must not be paired with anything. With wakelocks, > > > you couldn't have multiple pending events for the same device. > > > > You could, but you needed to define multiple wakelocks for the same device for > > this purpose. > > Yeah, okay, but who's going to do that? > > > > I'm not sure which model is better in practice. No doubt the Android people > > > will prefer their way. > > > > I suppose so. > > It may not make a significant difference in the end. You can always > emulate the wakelock approach by not calling pm_stay_awake() when you > know there is an earlier call still pending. > > > > This requires you to define an explicit PCI_WAKEUP_COOLDOWN delay. I > > > think that's okay; I had to do something similar with USB and SCSI. > > > (And I still think it would be a good idea to prevent workqueue threads > > > from freezing until their queues are empty.) > > > > I guess you mean the freezable ones? > > Yes. The unfreezable workqueue threads don't have to worry about > getting frozen while their queues are non-empty. :-) > > > I'm not sure if that helps a lot, because > > new work items may still be added after the workqueue thread has been frozen. > > That's not the point. If a wakeup handler queues a work item (for > example, by calling pm_request_resume) then it wouldn't need to guess a > timeout. The work item would be guaranteed to run before the system > could suspend again. You seem to be referring to the PM workqueue specifically. Perhaps it would be better to special-case it and stop it by adding a barrier work during suspend instead of just freezing? Then, it wouldn't need to be singlethread any more. Still, I think the timeout is necessary anyway in case the driver simply doesn't handle the event and user space needs time to catch up. Unfortunately, the PCI wakeup code doesn't know what happens next in advance. Rafael ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [update 2] Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend 2010-06-24 23:06 ` Rafael J. Wysocki @ 2010-06-25 15:09 ` Alan Stern 2010-06-25 20:37 ` Rafael J. Wysocki 0 siblings, 1 reply; 75+ messages in thread From: Alan Stern @ 2010-06-25 15:09 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Florian Mickler, Linux-pm mailing list, Matthew Garrett, Linux Kernel Mailing List, Dmitry Torokhov, Arve Hjønnevåg, Neil Brown, mark gross On Fri, 25 Jun 2010, Rafael J. Wysocki wrote: > > That's not the point. If a wakeup handler queues a work item (for > > example, by calling pm_request_resume) then it wouldn't need to guess a > > timeout. The work item would be guaranteed to run before the system > > could suspend again. > > You seem to be referring to the PM workqueue specifically. Perhaps it would be > better to special-case it and stop it by adding a barrier work during suspend > instead of just freezing? Then, it wouldn't need to be singlethread any more. The barrier work would have to be queued to each CPU's thread. That would be okay. Hmm, it looks like wait_event_freezable() and wait_event_freezable_timeout() could use similar changes: If the condition is true then they shouldn't try to freeze the caller. > Still, I think the timeout is necessary anyway in case the driver simply > doesn't handle the event and user space needs time to catch up. Unfortunately, > the PCI wakeup code doesn't know what happens next in advance. That could all be handled by the lower driver. Still, a 100-ms timeout isn't going to make a significant difference, since a suspend/resume cycle will take a comparable length of time. Alan Stern ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [update 2] Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend 2010-06-25 15:09 ` Alan Stern @ 2010-06-25 20:37 ` Rafael J. Wysocki 2010-06-25 20:57 ` Alan Stern 0 siblings, 1 reply; 75+ messages in thread From: Rafael J. Wysocki @ 2010-06-25 20:37 UTC (permalink / raw) To: Alan Stern Cc: Florian Mickler, Linux-pm mailing list, Matthew Garrett, Linux Kernel Mailing List, Dmitry Torokhov, Arve Hjønnevåg, Neil Brown, mark gross On Friday, June 25, 2010, Alan Stern wrote: > On Fri, 25 Jun 2010, Rafael J. Wysocki wrote: > > > > That's not the point. If a wakeup handler queues a work item (for > > > example, by calling pm_request_resume) then it wouldn't need to guess a > > > timeout. The work item would be guaranteed to run before the system > > > could suspend again. > > > > You seem to be referring to the PM workqueue specifically. Perhaps it would be > > better to special-case it and stop it by adding a barrier work during suspend > > instead of just freezing? Then, it wouldn't need to be singlethread any more. > > The barrier work would have to be queued to each CPU's thread. That > would be okay. I guess we should stop the PM workqueue after the freezing of tasks, shouldn't we? > Hmm, it looks like wait_event_freezable() and > wait_event_freezable_timeout() could use similar changes: If the > condition is true then they shouldn't try to freeze the caller. Yes, but that should be a separate patch IMHO. Rafael ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [update 2] Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend 2010-06-25 20:37 ` Rafael J. Wysocki @ 2010-06-25 20:57 ` Alan Stern 0 siblings, 0 replies; 75+ messages in thread From: Alan Stern @ 2010-06-25 20:57 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Florian Mickler, Linux-pm mailing list, Matthew Garrett, Linux Kernel Mailing List, Dmitry Torokhov, Arve Hjønnevåg, Neil Brown, mark gross On Fri, 25 Jun 2010, Rafael J. Wysocki wrote: > > > You seem to be referring to the PM workqueue specifically. Perhaps it would be > > > better to special-case it and stop it by adding a barrier work during suspend > > > instead of just freezing? Then, it wouldn't need to be singlethread any more. > > > > The barrier work would have to be queued to each CPU's thread. That > > would be okay. > > I guess we should stop the PM workqueue after the freezing of tasks, shouldn't we? Yes. The exact spot probably doesn't matter; that's as good as any. > > Hmm, it looks like wait_event_freezable() and > > wait_event_freezable_timeout() could use similar changes: If the > > condition is true then they shouldn't try to freeze the caller. > > Yes, but that should be a separate patch IMHO. Agreed. Alan Stern ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [update 2] Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend 2010-06-24 17:09 ` Alan Stern 2010-06-24 23:06 ` Rafael J. Wysocki @ 2010-06-25 6:40 ` Florian Mickler 2010-06-25 13:28 ` Rafael J. Wysocki 1 sibling, 1 reply; 75+ messages in thread From: Florian Mickler @ 2010-06-25 6:40 UTC (permalink / raw) To: Alan Stern Cc: Rafael J. Wysocki, Linux-pm mailing list, Matthew Garrett, Linux Kernel Mailing List, Dmitry Torokhov, Arve Hjønnevåg, Neil Brown, mark gross On Thu, 24 Jun 2010 13:09:27 -0400 (EDT) Alan Stern <stern@rowland.harvard.edu> wrote: > > > This requires you to define an explicit PCI_WAKEUP_COOLDOWN delay. I > > > think that's okay; I had to do something similar with USB and SCSI. > > > (And I still think it would be a good idea to prevent workqueue threads > > > from freezing until their queues are empty.) I'm not that familiar with the freezer, but couldn't it be deadlocky if the work depends on some already frozen part? What about a new work-type that calls pm_relax() after executing it's workfunction and executing pm_stay_awake() on enqueue? Cheers, Flo ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [update 2] Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend 2010-06-25 6:40 ` Florian Mickler @ 2010-06-25 13:28 ` Rafael J. Wysocki 0 siblings, 0 replies; 75+ messages in thread From: Rafael J. Wysocki @ 2010-06-25 13:28 UTC (permalink / raw) To: Florian Mickler Cc: Alan Stern, Linux-pm mailing list, Matthew Garrett, Linux Kernel Mailing List, Dmitry Torokhov, Arve Hjønnevåg, Neil Brown, mark gross On Friday, June 25, 2010, Florian Mickler wrote: > On Thu, 24 Jun 2010 13:09:27 -0400 (EDT) > Alan Stern <stern@rowland.harvard.edu> wrote: > > > > > This requires you to define an explicit PCI_WAKEUP_COOLDOWN delay. I > > > > think that's okay; I had to do something similar with USB and SCSI. > > > > (And I still think it would be a good idea to prevent workqueue threads > > > > from freezing until their queues are empty.) > > I'm not that familiar with the freezer, but couldn't it be > deadlocky if the work depends on some already frozen part? No, in the case of freezable workqueues (which is the one we're discussing) they generally can't depend on anything freezable, because it's never known which freezable tasks will be frozen first. > What about a new work-type that calls > pm_relax() after executing it's workfunction and executing > pm_stay_awake() on enqueue? That might be useful., although it doesn't really help here, because there still is a window between queuing up a work item and executing it. Rafael ^ permalink raw reply [flat|nested] 75+ messages in thread
[parent not found: <201006222159.28081.rjw__37084.1419128284$1277237903$gmane$org@sisk.pl>]
* Re: [update] Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend [not found] ` <201006222159.28081.rjw__37084.1419128284$1277237903$gmane$org@sisk.pl> @ 2010-06-24 14:16 ` Andy Lutomirski 2010-06-24 14:45 ` Alan Stern 2010-06-24 14:48 ` Rafael J. Wysocki 0 siblings, 2 replies; 75+ messages in thread From: Andy Lutomirski @ 2010-06-24 14:16 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Alan Stern, mark gross, Neil Brown, Dmitry Torokhov, Linux Kernel Mailing List, Arve, Florian Mickler, Linux-pm mailing list Rafael J. Wysocki wrote: > On Tuesday, June 22, 2010, Rafael J. Wysocki wrote: >> On Tuesday, June 22, 2010, Alan Stern wrote: >>> On Tue, 22 Jun 2010, Rafael J. Wysocki wrote: > ... >>>> So, even if we can say when the kernel has finished processing the event >>>> (although that would be complicated in the PCIe case above), I don't think >>>> it's generally possible to ensure that the entire processing of a wakeup event >>>> has been completed. This leads to the question whether or not it is worth >>>> trying to detect the ending of the processing of a wakeup event. >>> As Arve pointed out, in some cases it definitely is worthwhile (the >>> gpio keypad matrix example). In other cases there may be no reasonable >>> way to tell. That doesn't mean we have to give up entirely. >> Well, I'm not sure, because that really depends on the hardware and bus in >> question. The necessary condition seems to be that the event be detected >> and handled entirely by the same functional unit (eg. a device driver) within >> the kernel and such that it is able to detect whether or not user space has >> acquired the event information. That doesn't seem to be a common case to me. > > Anyway, below's an update that addresses this particular case. > > It adds two more functions, pm_wakeup_begin() and pm_wakeup_end() > that play similar roles to suspend_block() and suspend_unblock(), but they > don't operate on suspend blocker objects. Instead, the first of them increases > a counter of events in progress and the other one decreases this counter. > Together they have the same effect as pm_wakeup_event(), but the counter > of wakeup events in progress they operate on is also checked by > pm_check_wakeup_events(). > > Thus there are two ways kernel subsystems can signal wakeup events. First, > if the event is not explicitly handed over to user space and "instantaneous", > they can simply call pm_wakeup_event() and be done with it. Second, if the > event is going to be delivered to user space, the subsystem that processes > the event can call pm_wakeup_begin() right when the event is detected and > pm_wakeup_end() when it's been handed over to user space. How does userspace handle this without races? (I don't see an example in a driver that talks to userspace in your code...) For example, if I push a button on my keyboard, the driver calls pm_wakeup_begin(). Then userspace reads the key from the evdev device and tells the userspace suspend manager not to go to sleep. But there's a race: the keyboard driver (or input subsystem) could call pm_wakeup_end() before the userspace program has a chance to tell the suspend manager not to sleep. One possibility would be for poll to report that events are pending without calling pm_wakeup_end(), giving userspace a chance to prevent itself from suspending before actually reading the event. (Also, should "echo mem >/sys/power/state" be different from "echo mem_respect_suspend_blockers >/sys/power/state?" If I physically press the suspend key on my laptop, I want it to go to sleep even though I'm still holding the Fn key that was part of the suspend hotkey.) --Andy ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [update] Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend 2010-06-24 14:16 ` [update] " Andy Lutomirski @ 2010-06-24 14:45 ` Alan Stern 2010-06-24 14:48 ` Rafael J. Wysocki 1 sibling, 0 replies; 75+ messages in thread From: Alan Stern @ 2010-06-24 14:45 UTC (permalink / raw) To: Andy Lutomirski Cc: Rafael J. Wysocki, mark gross, Neil Brown, Dmitry Torokhov, Linux Kernel Mailing List, Arve, Florian Mickler, Linux-pm mailing list On Thu, 24 Jun 2010, Andy Lutomirski wrote: > How does userspace handle this without races? (I don't see an example > in a driver that talks to userspace in your code...) > > For example, if I push a button on my keyboard, the driver calls > pm_wakeup_begin(). Then userspace reads the key from the evdev device > and tells the userspace suspend manager not to go to sleep. > > But there's a race: the keyboard driver (or input subsystem) could call > pm_wakeup_end() before the userspace program has a chance to tell the > suspend manager not to sleep. The assumption is that the user program will poll the evdev device. When the poll indicates data is available, the program will tell the userspace power manager not to go to sleep before reading the data. This avoids the race. > One possibility would be for poll to report that events are pending > without calling pm_wakeup_end(), giving userspace a chance to prevent > itself from suspending before actually reading the event. Exactly. The critical section doesn't end until the events have been read. Polling alone doesn't affect the critical section. > (Also, should "echo mem >/sys/power/state" be different from "echo > mem_respect_suspend_blockers >/sys/power/state?" If I physically press > the suspend key on my laptop, I want it to go to sleep even though I'm > still holding the Fn key that was part of the suspend hotkey.) With Rafael's scheme, the difference is not in the write to /sys/power/state but rather in the write to /sys/power/wakeup_count. If the power manager program doesn't write to /sys/power/wakeup_count first then /sys/power/state takes effect immediately, just as it does now. Alan Stern ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [update] Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend 2010-06-24 14:16 ` [update] " Andy Lutomirski 2010-06-24 14:45 ` Alan Stern @ 2010-06-24 14:48 ` Rafael J. Wysocki 2010-06-24 15:21 ` Andy Lutomirski 1 sibling, 1 reply; 75+ messages in thread From: Rafael J. Wysocki @ 2010-06-24 14:48 UTC (permalink / raw) To: Andy Lutomirski Cc: Alan Stern, mark gross, Neil Brown, Dmitry Torokhov, Linux Kernel Mailing List, Arve, Florian Mickler, Linux-pm mailing list On Thursday, June 24, 2010, Andy Lutomirski wrote: > Rafael J. Wysocki wrote: > > On Tuesday, June 22, 2010, Rafael J. Wysocki wrote: > >> On Tuesday, June 22, 2010, Alan Stern wrote: > >>> On Tue, 22 Jun 2010, Rafael J. Wysocki wrote: > > ... > >>>> So, even if we can say when the kernel has finished processing the event > >>>> (although that would be complicated in the PCIe case above), I don't think > >>>> it's generally possible to ensure that the entire processing of a wakeup event > >>>> has been completed. This leads to the question whether or not it is worth > >>>> trying to detect the ending of the processing of a wakeup event. > >>> As Arve pointed out, in some cases it definitely is worthwhile (the > >>> gpio keypad matrix example). In other cases there may be no reasonable > >>> way to tell. That doesn't mean we have to give up entirely. > >> Well, I'm not sure, because that really depends on the hardware and bus in > >> question. The necessary condition seems to be that the event be detected > >> and handled entirely by the same functional unit (eg. a device driver) within > >> the kernel and such that it is able to detect whether or not user space has > >> acquired the event information. That doesn't seem to be a common case to me. > > > > Anyway, below's an update that addresses this particular case. > > > > It adds two more functions, pm_wakeup_begin() and pm_wakeup_end() > > that play similar roles to suspend_block() and suspend_unblock(), but they > > don't operate on suspend blocker objects. Instead, the first of them increases > > a counter of events in progress and the other one decreases this counter. > > Together they have the same effect as pm_wakeup_event(), but the counter > > of wakeup events in progress they operate on is also checked by > > pm_check_wakeup_events(). > > > > Thus there are two ways kernel subsystems can signal wakeup events. First, > > if the event is not explicitly handed over to user space and "instantaneous", > > they can simply call pm_wakeup_event() and be done with it. Second, if the > > event is going to be delivered to user space, the subsystem that processes > > the event can call pm_wakeup_begin() right when the event is detected and > > pm_wakeup_end() when it's been handed over to user space. > > How does userspace handle this without races? (I don't see an example > in a driver that talks to userspace in your code...) > > For example, if I push a button on my keyboard, the driver calls > pm_wakeup_begin(). Then userspace reads the key from the evdev device > and tells the userspace suspend manager not to go to sleep. > > But there's a race: the keyboard driver (or input subsystem) could call > pm_wakeup_end() before the userspace program has a chance to tell the > suspend manager not to sleep. That basically is what /sys/power/wakeup_count is for. The power manager is supposed to first read from it (that will block if there are any events in progress in the last, recently posted, version of the patch), obtain ACKs from user space event consumers known to it, then write to /sys/power/wakeup_count (that will fail if there were any wakeup events in the meantime) and write to /sys/power/state only if that was successful. Rafael ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [update] Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend 2010-06-24 14:48 ` Rafael J. Wysocki @ 2010-06-24 15:21 ` Andy Lutomirski 0 siblings, 0 replies; 75+ messages in thread From: Andy Lutomirski @ 2010-06-24 15:21 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Andy Lutomirski, Alan Stern, mark gross, Neil Brown, Dmitry Torokhov, Linux Kernel Mailing List, Arve, Florian Mickler, Linux-pm mailing list On Jun 24, 2010, at 10:48 AM, "Rafael J. Wysocki" <rjw@sisk.pl> wrote: > On Thursday, June 24, 2010, Andy Lutomirski wrote: >> Rafael J. Wysocki wrote: >>> On Tuesday, June 22, 2010, Rafael J. Wysocki wrote: >>>> On Tuesday, June 22, 2010, Alan Stern wrote: >>>>> On Tue, 22 Jun 2010, Rafael J. Wysocki wrote: >>> ... >>>>>> So, even if we can say when the kernel has finished processing >>>>>> the event >>>>>> (although that would be complicated in the PCIe case above), I >>>>>> don't think >>>>>> it's generally possible to ensure that the entire processing of >>>>>> a wakeup event >>>>>> has been completed. This leads to the question whether or not >>>>>> it is worth >>>>>> trying to detect the ending of the processing of a wakeup event. >>>>> As Arve pointed out, in some cases it definitely is worthwhile >>>>> (the >>>>> gpio keypad matrix example). In other cases there may be no >>>>> reasonable >>>>> way to tell. That doesn't mean we have to give up entirely. >>>> Well, I'm not sure, because that really depends on the hardware >>>> and bus in >>>> question. The necessary condition seems to be that the event be >>>> detected >>>> and handled entirely by the same functional unit (eg. a device >>>> driver) within >>>> the kernel and such that it is able to detect whether or not user >>>> space has >>>> acquired the event information. That doesn't seem to be a common >>>> case to me. >>> >>> Anyway, below's an update that addresses this particular case. >>> >>> It adds two more functions, pm_wakeup_begin() and pm_wakeup_end() >>> that play similar roles to suspend_block() and suspend_unblock(), >>> but they >>> don't operate on suspend blocker objects. Instead, the first of >>> them increases >>> a counter of events in progress and the other one decreases this >>> counter. >>> Together they have the same effect as pm_wakeup_event(), but the >>> counter >>> of wakeup events in progress they operate on is also checked by >>> pm_check_wakeup_events(). >>> >>> Thus there are two ways kernel subsystems can signal wakeup >>> events. First, >>> if the event is not explicitly handed over to user space and >>> "instantaneous", >>> they can simply call pm_wakeup_event() and be done with it. >>> Second, if the >>> event is going to be delivered to user space, the subsystem that >>> processes >>> the event can call pm_wakeup_begin() right when the event is >>> detected and >>> pm_wakeup_end() when it's been handed over to user space. >> >> How does userspace handle this without races? (I don't see an >> example >> in a driver that talks to userspace in your code...) >> >> For example, if I push a button on my keyboard, the driver calls >> pm_wakeup_begin(). Then userspace reads the key from the evdev >> device >> and tells the userspace suspend manager not to go to sleep. >> >> But there's a race: the keyboard driver (or input subsystem) could >> call >> pm_wakeup_end() before the userspace program has a chance to tell the >> suspend manager not to sleep. > > That basically is what /sys/power/wakeup_count is for. The power > manager is > supposed to first read from it (that will block if there are any > events in > progress in the last, recently posted, version of the patch), obtain > ACKs from > user space event consumers known to it, then write to > /sys/power/wakeup_count (that will fail if there were any wakeup > events in the > meantime) and write to /sys/power/state only if that was successful. That sounds a good deal more complicated than the poll, block suspend, read, unblock approach, but I guess that as long as both work, the kernel doesn't really have to care. > > Rafael ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend 2010-06-22 10:21 ` Rafael J. Wysocki 2010-06-22 14:35 ` Alan Stern @ 2010-06-22 23:00 ` mark gross 1 sibling, 0 replies; 75+ messages in thread From: mark gross @ 2010-06-22 23:00 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Alan Stern, Florian Mickler, Linux-pm mailing list, Matthew Garrett, Linux Kernel Mailing List, Dmitry Torokhov, Arve Hjønnevåg, Neil Brown, mark gross On Tue, Jun 22, 2010 at 12:21:53PM +0200, Rafael J. Wysocki wrote: > On Tuesday, June 22, 2010, Rafael J. Wysocki wrote: > > On Tuesday, June 22, 2010, Alan Stern wrote: > > > On Mon, 21 Jun 2010, Florian Mickler wrote: > > > > > > > > In the end you would want to have communication in both directions: > > > > > suspend blockers _and_ callbacks. Polling is bad if done too often. > > > > > But I think the idea is a good one. > > > > > > > > Actually, I'm not so shure. > > > > > > > > 1. you have to roundtrip whereas in the suspend_blocker scheme you have > > > > active annotations (i.e. no further action needed) > > > > > > That's why it's best to use both. The normal case is that programs > > > activate and deactivate blockers by sending one-way messages to the PM > > > process. The exceptional case is when the PM process is about to > > > initiate a suspend; that's when it does the round-trip polling. Since > > > the only purpose of the polling is to avoid a race, 90% of the time it > > > will succeed. > > > > > > > 2. it may not be possible for a user to determine if a wake-event is > > > > in-flight. you would have to somehow pass the wake-event-number with > > > > it, so that the userspace process could ack it properly without > > > > confusion. Or... I don't know of anything else... > > > > > > > > 1. userspace-manager (UM) reads a number (42). > > > > > > > > 2. it questions userspace program X: is it ok to suspend? > > > > > > > > [please fill in how userspace program X determines to block > > > > suspend] > > > > > > > > 3a. UM's roundtrip ends and it proceeds to write "42" to the > > > > kernel [suspending] > > > > 3b. UM's roundtrip ends and it aborts suspend, because a > > > > (userspace-)suspend-blocker got activated > > > > > > > > I'm not shure how the userspace program could determine that there is a > > > > wake-event in flight. Perhaps by storing the number of last wake-event. > > > > But then you need per-wake-event-counters... :| > > > > > > Rafael seems to think timeouts will fix this. I'm not so sure. > > > > > > > Do you have some thoughts about the wake-event-in-flight detection? > > > > > > Not really, except for something like the original wakelock scheme in > > > which the kernel tells the PM core when an event is over. > > > > But the kernel doesn't really know that, so it really can't tell the PM core > > anything useful. What happens with suspend blockers is that a kernel subsystem > > cooperates with a user space consumer of the event to get the story straight. > > > > However, that will only work if the user space is not buggy and doesn't crash, > > for example, before releasing the suspend blocker it's holding. > > Having reconsidered that I think there's more to it. > > Take the PCI subsystem as an example, specifically pcie_pme_handle_request(). > This is the place where wakeup events are started, but it has no idea about > how they are going to be handled. Thus in the suspend blocker scheme it would > need to activate a blocker, but it wouldn't be able to release it. So, it > seems, we would need to associate a suspend blocker with each PCIe device > that can generate wakeup events and require all drivers of such devices to > deal with a blocker activated by someone else (PCIe PME driver in this > particular case). That sounds cumbersome to say the least. > > Moreover, even if we do that, it still doesn't solve the entire problem, > because the event may need to be delivered to user space and processed by it. > While a driver can check if user space has already read the event, it has > no way to detect whether or not it has finished processing it. In fact, > processing an event may involve an interaction with a (human) user and there's > no general way by which software can figure out when the user considers the > event as processed. > > It looks like user space suspend blockers only help in some special cases > when the user space processing of a wakeup event is simple enough, but I don't > think they help in general. For an extreme example, a user may want to wake up > a system using wake-on-LAN to log into it, do some work and log out, so > effectively the initial wakeup event has not been processed entirely until the > user finally logs out of the system. Now, after the system wakeup (resulting > from the wake-on-LAN signal) we need to give the user some time to log in, but > if the user doesn't do that in certain time, it may be reasonable to suspend > and let the user wake up the system again. > > Similar situation takes place when the system is woken up by a lid switch. > Evidently, the user has opened the laptop lid to do something, but we don't > even know what the user is going to do, so there's no way we can say when > the wakeup event is finally processed. > > So, even if we can say when the kernel has finished processing the event > (although that would be complicated in the PCIe case above), I don't think > it's generally possible to ensure that the entire processing of a wakeup event > has been completed. This leads to the question whether or not it is worth > trying to detect the ending of the processing of a wakeup event. > > Now, going back to the $subject patch, I didn't really think it would be > suitable for opportunistic suspend, so let's focus on the "forced" suspend > instead. It still has the problem that wakeup events occuring while > /sys/power/state is written to (or even slightly before) should cause the > system to cancel the suspend, but they generally won't. With the patch > applied that can be avoided by (a) reading from /sys/power/wakeup_count, > (b) waiting for certain time (such that if a suspend event is not entirely > processed within that time, it's worth suspending and waking up the > system again) and (c) writing to /sys/power/wakeup_count right before writing > to /sys/power/state (where the latter is only done if the former succeeds). > This is what thought was the problem your idea as trying to deal with. --mgross ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend 2010-06-21 5:32 ` Florian Mickler 2010-06-21 15:23 ` Alan Stern @ 2010-06-21 16:54 ` Alan Stern 2010-06-21 20:40 ` Florian Mickler 2010-06-21 21:18 ` Rafael J. Wysocki 1 sibling, 2 replies; 75+ messages in thread From: Alan Stern @ 2010-06-21 16:54 UTC (permalink / raw) To: Florian Mickler Cc: Rafael J. Wysocki, Linux-pm mailing list, Matthew Garrett, Linux Kernel Mailing List, Dmitry Torokhov, Arve Hjønnevåg, Neil Brown, mark gross On Mon, 21 Jun 2010, Florian Mickler wrote: > > > > What happens if an event arrives just before you read > > > > /sys/power/wakeup_count, but the userspace consumer doesn't realize > > > > there is a new unprocessed event until after the power manager checks > > > > it? > > > > > I think this is not the kernel's problem. In this approach the kernel makes it > > > possible for the user space to avoid the race. Whether or not the user space > > > will use this opportunity is a different matter. > > > > It is _not_ possible for userspace to avoid this race. Help from the > > kernel is needed. > > It is possible if every (relevant) userspace program implements a > callback for the powermanager to check if one of it's wakeup-sources > got activated. > > That way the powermanager would read /sys/power/wakeup_count, then do > the roundtrip to all it's registered users and only then suspend. After further thought, there's still a race: A wakeup event arrives. The kernel increments /sys/power/wakeup_count and starts processing the event. The power-manager process reads /sys/power/wakeup_count. The power-manager process polls the relevant programs and they all say no events are pending. The power-manager process successfully writes /sys/power/wakeup_count. The power-manager process initiates a suspend. ... Hours later ... The system wakes up. The kernel finishes its internal processing of the event and sends a notification to a user program. The problem here is that the power-manager process can't tell when the kernel has finished processing an event. This is true both for events that need to propagate to userspace and for events that are handled entirely by the kernel. This is a reflection of the two distinct pieces of information that we need to keep track of: A wakeup event has arrived, so it's no longer safe to suspend. Wakeup events are no longer pending, so it's once again safe to suspend. The wakeup_count interface tracks the first, but in this scheme nothing tracks the second. Alan Stern ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend 2010-06-21 16:54 ` Alan Stern @ 2010-06-21 20:40 ` Florian Mickler 2010-06-21 21:18 ` Rafael J. Wysocki 1 sibling, 0 replies; 75+ messages in thread From: Florian Mickler @ 2010-06-21 20:40 UTC (permalink / raw) To: Alan Stern Cc: Rafael J. Wysocki, Linux-pm mailing list, Matthew Garrett, Linux Kernel Mailing List, Dmitry Torokhov, Arve Hjønnevåg, Neil Brown, mark gross On Mon, 21 Jun 2010 12:54:44 -0400 (EDT) Alan Stern <stern@rowland.harvard.edu> wrote: > On Mon, 21 Jun 2010, Florian Mickler wrote: > > > > > > What happens if an event arrives just before you read > > > > > /sys/power/wakeup_count, but the userspace consumer doesn't realize > > > > > there is a new unprocessed event until after the power manager checks > > > > > it? > > > > > > > I think this is not the kernel's problem. In this approach the kernel makes it > > > > possible for the user space to avoid the race. Whether or not the user space > > > > will use this opportunity is a different matter. > > > > > > It is _not_ possible for userspace to avoid this race. Help from the > > > kernel is needed. > > > > It is possible if every (relevant) userspace program implements a > > callback for the powermanager to check if one of it's wakeup-sources > > got activated. > > > > That way the powermanager would read /sys/power/wakeup_count, then do > > the roundtrip to all it's registered users and only then suspend. > > After further thought, there's still a race: Ah, yes. That's what I meant in my other mail with 'how to determine that an wake event is in flight'. Read this too late. > Alan Stern Cheers, Flo ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend 2010-06-21 16:54 ` Alan Stern 2010-06-21 20:40 ` Florian Mickler @ 2010-06-21 21:18 ` Rafael J. Wysocki 2010-06-21 22:27 ` Alan Stern 1 sibling, 1 reply; 75+ messages in thread From: Rafael J. Wysocki @ 2010-06-21 21:18 UTC (permalink / raw) To: Alan Stern Cc: Florian Mickler, Linux-pm mailing list, Matthew Garrett, Linux Kernel Mailing List, Dmitry Torokhov, Arve Hjønnevåg, Neil Brown, mark gross On Monday, June 21, 2010, Alan Stern wrote: > On Mon, 21 Jun 2010, Florian Mickler wrote: > > > > > > What happens if an event arrives just before you read > > > > > /sys/power/wakeup_count, but the userspace consumer doesn't realize > > > > > there is a new unprocessed event until after the power manager checks > > > > > it? > > > > > > > I think this is not the kernel's problem. In this approach the kernel makes it > > > > possible for the user space to avoid the race. Whether or not the user space > > > > will use this opportunity is a different matter. > > > > > > It is _not_ possible for userspace to avoid this race. Help from the > > > kernel is needed. > > > > It is possible if every (relevant) userspace program implements a > > callback for the powermanager to check if one of it's wakeup-sources > > got activated. > > > > That way the powermanager would read /sys/power/wakeup_count, then do > > the roundtrip to all it's registered users and only then suspend. > > After further thought, there's still a race: > > A wakeup event arrives. > > The kernel increments /sys/power/wakeup_count and starts > processing the event. > > The power-manager process reads /sys/power/wakeup_count. You assume that these two steps will occur instantaneously one after the other, while there may be (and in fact there should be) a delay between them. I would make the power manager wait for certain time after reading /sys/power/wakeup_count and if no wakeup events are reported to it within that time, to write to /sys/power/wakeup_count. The length of the time to wait would be system-dependent in general, but I'd also allow the event consumers to influence it (like when an application knows it will process things for 10 minutes going forward, so it tells the power manager to wait for at least 10 minutes before attempting to suspend). > The power-manager process polls the relevant programs and > they all say no events are pending. > > The power-manager process successfully writes > /sys/power/wakeup_count. > > The power-manager process initiates a suspend. > > ... Hours later ... > > The system wakes up. > > The kernel finishes its internal processing of the event and > sends a notification to a user program. > > The problem here is that the power-manager process can't tell when the > kernel has finished processing an event. This is true both for events > that need to propagate to userspace and for events that are handled > entirely by the kernel. > > This is a reflection of the two distinct pieces of information that we > need to keep track of: > > A wakeup event has arrived, so it's no longer safe to suspend. > > Wakeup events are no longer pending, so it's once again > safe to suspend. > > The wakeup_count interface tracks the first, but in this scheme nothing > tracks the second. Which I don't think is really necessary, because we'll need to use timeouts anyway, at least for events that have no user space consumers. Rafael ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend 2010-06-21 21:18 ` Rafael J. Wysocki @ 2010-06-21 22:27 ` Alan Stern 0 siblings, 0 replies; 75+ messages in thread From: Alan Stern @ 2010-06-21 22:27 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Florian Mickler, Linux-pm mailing list, Matthew Garrett, Linux Kernel Mailing List, Dmitry Torokhov, Arve Hjønnevåg, Neil Brown, mark gross On Mon, 21 Jun 2010, Rafael J. Wysocki wrote: > > After further thought, there's still a race: > > > > A wakeup event arrives. > > > > The kernel increments /sys/power/wakeup_count and starts > > processing the event. > > > > The power-manager process reads /sys/power/wakeup_count. > > You assume that these two steps will occur instantaneously one after the other, > while there may be (and in fact there should be) a delay between them. No, I'm not assuming that. > I would make the power manager wait for certain time after reading > /sys/power/wakeup_count and if no wakeup events are reported to it within > that time, to write to /sys/power/wakeup_count. Why? That's just wasted time -- time during which the system could have been suspended. I can understand the power manager might reason as follows: If this wakeup event hasn't been handed over to a userspace program in the next 5 seconds, I'm going to suspend anyway on the theory that something is wrong. But why do that when you can get exact information? > The length of the time to wait would be system-dependent in general, but I'd > also allow the event consumers to influence it (like when an application knows > it will process things for 10 minutes going forward, so it tells the power > manager to wait for at least 10 minutes before attempting to suspend). It tells the power manager to wait by activating a userspace suspend blocker. While a blocker is active, the power manager doesn't have to poll /sys/power/wakeup_count or anything; it just waits for all the suspend blockers to be deactivated. And there's no guesswork involved; the power manager knows immediately when it's time to try suspending again. > > The power-manager process polls the relevant programs and > > they all say no events are pending. > > > > The power-manager process successfully writes > > /sys/power/wakeup_count. > > > > The power-manager process initiates a suspend. > > > > ... Hours later ... > > > > The system wakes up. > > > > The kernel finishes its internal processing of the event and > > sends a notification to a user program. > > > > The problem here is that the power-manager process can't tell when the > > kernel has finished processing an event. This is true both for events > > that need to propagate to userspace and for events that are handled > > entirely by the kernel. > > > > This is a reflection of the two distinct pieces of information that we > > need to keep track of: > > > > A wakeup event has arrived, so it's no longer safe to suspend. > > > > Wakeup events are no longer pending, so it's once again > > safe to suspend. > > > > The wakeup_count interface tracks the first, but in this scheme nothing > > tracks the second. > > Which I don't think is really necessary, because we'll need to use timeouts > anyway, at least for events that have no user space consumers. You wouldn't need to use timeouts for them either if the kernel had a way to indicate when it was finished processing events. Alan Stern ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend 2010-06-21 2:23 ` Alan Stern 2010-06-21 5:32 ` Florian Mickler @ 2010-06-21 6:13 ` mark gross 2010-06-21 12:10 ` tytso 2010-06-21 16:01 ` Alan Stern 2010-06-21 21:58 ` Rafael J. Wysocki 2 siblings, 2 replies; 75+ messages in thread From: mark gross @ 2010-06-21 6:13 UTC (permalink / raw) To: Alan Stern Cc: Rafael J. Wysocki, Linux-pm mailing list, Matthew Garrett, Linux Kernel Mailing List, Dmitry Torokhov, Arve Hjønnevåg, Neil Brown, mark gross On Sun, Jun 20, 2010 at 10:23:38PM -0400, Alan Stern wrote: > On Sun, 20 Jun 2010, Rafael J. Wysocki wrote: > > > > > Generally, there are two problems in that area. First, if a wakeup event > > > > occurs exactly at the same time when /sys/power/state is being written to, > > > > the even may be delivered to user space right before the freezing of it, > > > > in which case the user space consumer of the event may not be able to process > > > > it before the system is suspended. > > > > > > Indeed, the same problem arises if the event isn't delivered to > > > userspace until after userspace is frozen. > > > > In that case the kernel should abort the suspend so that the event can be > > delivered to the user space. > > Yes. > > > > Of course, the underlying issue here is that the kernel has no direct way > > > to know when userspace has finished processing an event. Userspace would > > > have to tell it, which generally would mean rewriting some large number of user > > > programs. > > > > I'm not sure of that. If the kernel doesn't initiate suspend, it doesn't > > really need to know whether or not user space has already consumed the event. > > That's true. But it only shifts the onus: When a userspace program has > finished processing an event, it has to tell the power-manager process. > Clearly this sort of thing is unavoidable, one way or another. > > > > > The following patch illustrates my idea of how these two problems may be > > > > addressed. It introduces a new global sysfs attribute, > > > > /sys/power/wakeup_count, associated with a running counter of wakeup events > > > > and a helper function, pm_wakeup_event(), that may be used by kernel subsystems > > > > to increment the wakeup events counter. > > > > > > In what way is this better than suspend blockers? > > > > It doesn't add any new framework and it doesn't require the users of > > pm_wakeup_event() to "unblock" suspend, so it is simpler. It also doesn't add > > the user space interface that caused so much opposition to appear. > > Okay. A quick comparison shows that in your proposal: > > There's no need to register and unregister suspend blockers. > But instead you create the equivalent of a suspend blocker > inside every struct device. > > Drivers (or subsystems) don't have to activate suspend > blockers. But instead they have to call pm_wakeup_event(). > > Drivers don't have to deactivate suspend blockers. You don't > have anything equivalent, and as a result your scheme is > subject to the race described below. > > There are no userspace suspend blockers and no opportunistic > suspend. Instead a power-manager process takes care of > initiating or preventing suspends as needed. > > In short, you have eliminated the userspace part of the suspend blocker > approach just as in some of the proposals posted earlier, and you have > replaced the in-kernel suspend blockers with new data in struct device > and a new PM API. On the whole, it doesn't seem very different from > the in-kernel part of suspend blockers. The most notable difference is > the name: pm_wake_event() vs. suspend_blocker_activate(), or whatever > it ended up being called. > > This is the race I was talking about: Your confused about what problem this patch attempts to solve. There is a pm_qos patch in the works to address the suspend blocker functionality. http://lists.linux-foundation.org/pipermail/linux-pm/2010-June/026760.html --mgross > > > > What happens if an event arrives just before you read > > > /sys/power/wakeup_count, but the userspace consumer doesn't realize > > > there is a new unprocessed event until after the power manager checks > > > it? > > > I think this is not the kernel's problem. In this approach the kernel makes it > > possible for the user space to avoid the race. Whether or not the user space > > will use this opportunity is a different matter. > > It is _not_ possible for userspace to avoid this race. Help from the > kernel is needed. > > > > Your plan is missing a critical step: the "handoff" whereby > > > responsibility for handling an event passes from the kernel to > > > userspace. > > > > With suspend blockers, this handoff occurs when an event queue is > > > emptied and its associate suspend blocker is deactivated. Or with some > > > kinds of events for which the Android people have not written an > > > explicit handoff, it occurs when a timer expires (timed suspend > > > blockers). > > > > Well, quite frankly, I don't see any difference here. In either case there is > > a possibility for user space to mess up things and the kernel can't really help > > that. > > With suspend blockers, there is also the possibility for userspace to > handle races correctly. But with your scheme there isn't -- that's the > difference. > > > > This shares with the other alternatives posted recently the need for a > > > central power-manager process. And like in-kernel suspend blockers, it > > > requires changes to wakeup-capable drivers (the wakeup-events counter > > > has to be incremented). > > > > It doesn't really require changes to drivers, but to code that knows of wakeup > > events, like the PCI runtime wakeup code. > > Just like in-kernel suspend blockers. > > > Moreover, it doesn't require kernel > > subsystems to know or even care when it is reasonable to allow suspend to > > happen. The only thing they need to do is to call pm_wakeup_event() whenever > > they see a wakeup event. > > That's just semantics. Obviously a wakeup event should prevent suspend > from happening, so if subsystems know or care about one then they know > or care about the other. > > > I don't really think it is too much of a requirement > > (and quite frnakly I can't imagine anything simpler than that). > > This is because you have omitted the part about allowing suspends again > (or if you prefer, about notifying the PM core that a wakeup event has > been handed off to userspace). As a result of leaving this out, you > haven't eliminated all the races. > > > Yes, it does, but I have an idea about how to implement such a power manager > > and I'm going to actually try it. > > A logical design would be to use dbus for disseminating PM-related > information. Does your idea work that way? > > > I don't think any of the approaches that don't use suspend blockers allows > > one to avoid the race between the process that writes to /sys/power/state > > and a wakeup event happening at the same time. They attempt to address another > > issue, which is how to prevent untrusted user space processes from keeping the > > system out of idle, but that is a different story. > > Well, there was one approach that didn't use suspend blockers and did > solve the race: the original wakelocks proposal. Of course, that was > just suspend blockers under a different name. One could make a very > good case that your scheme is also suspend blockers under a different > name (and with an important part missing). > > Alan Stern > ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend 2010-06-21 6:13 ` mark gross @ 2010-06-21 12:10 ` tytso 2010-06-21 12:22 ` Alan Cox 2010-06-22 1:07 ` mark gross 2010-06-21 16:01 ` Alan Stern 1 sibling, 2 replies; 75+ messages in thread From: tytso @ 2010-06-21 12:10 UTC (permalink / raw) To: markgross Cc: Alan Stern, Rafael J. Wysocki, Linux-pm mailing list, Matthew Garrett, Linux Kernel Mailing List, Dmitry Torokhov, Arve Hjønnevåg, Neil Brown, mark gross So where are we at this point? Discussion had completely died down for a while, and it's picked up again, but it's not clear to me that we're any closer to reaching consensus. There's been one proposal that we simply merge in a set of no-op inline functions for suspend blockers, just so we can get let the drivers go in (assuming that Greg K-H believes this is still a problem), but with an automatical removal of N months (N to be decided, say 9 or 12 or 18 months). My concern is that if we do that, we will have simply kicked the ball down the road for N months. Another approach is to simply merge in no-op functions and not leave any kind of deprecation schedule. That's sort of an implicit admission of the fact that we may not reach consensus on this issue. Or we could simply ship the patches as-is to Linus after he gets back from vacation and ask him for a thumbs up or thumbs down vote, which might settle things once and for all. How do we go forward from here? - Ted ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend 2010-06-21 12:10 ` tytso @ 2010-06-21 12:22 ` Alan Cox 2010-06-21 12:26 ` Florian Mickler 2010-06-21 13:42 ` tytso 2010-06-22 1:07 ` mark gross 1 sibling, 2 replies; 75+ messages in thread From: Alan Cox @ 2010-06-21 12:22 UTC (permalink / raw) To: tytso Cc: markgross, Alan Stern, Rafael J. Wysocki, Linux-pm mailing list, Matthew Garrett, Linux Kernel Mailing List, Dmitry Torokhov, Arve Hjønnevåg, Neil Brown, mark gross > There's been one proposal that we simply merge in a set of no-op > inline functions for suspend blockers, just so we can get let the > drivers go in (assuming that Greg K-H believes this is still a > problem), but with an automatical removal of N months (N to be > decided, say 9 or 12 or 18 months). If you automatically remove the suspend blocker wrappers in 12 months then you can keep the drivers in tree. The drivers are generally not a problem (ok some of them like the binder are interesting little trips) its just those hooks, and we'd all benefit somewhat even if the only bit google are patching back in are their hooks. > down the road for N months. Another approach is to simply merge in > no-op functions and not leave any kind of deprecation schedule. > That's sort of an implicit admission of the fact that we may not reach > consensus on this issue. Or we could simply ship the patches as-is to Very bad precedent. What happens when every other vendor does the same ? Keep the upstream code clean. > Linus after he gets back from vacation and ask him for a thumbs up or > thumbs down vote, which might settle things once and for all. You seem desperate to just throw it at Linus - you have been all along before the discussion, during it and now: but I don't understand why ? It's as if you don't want progress to be made by other means ? > How do we go forward from here? PM QoS seems to be evolving nicely. As for merging stuff - if its new code then it should get submitted with the hooks left out anyway, just as all the other vendors are doing with weird little custom hooks of their own. The hooks can still easily be patched back in as a tiny easy to maintain vendor patch. This works - the code still gets updated and seen by people, only the little hook patch has to be maintained and generally it looks after itself except for the odd .rej when something changes right up by the merge point. Alan ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend 2010-06-21 12:22 ` Alan Cox @ 2010-06-21 12:26 ` Florian Mickler 2010-06-21 13:42 ` tytso 1 sibling, 0 replies; 75+ messages in thread From: Florian Mickler @ 2010-06-21 12:26 UTC (permalink / raw) To: Alan Cox Cc: tytso, mark gross, markgross, Neil Brown, Dmitry Torokhov, Linux Kernel Mailing List, Linux-pm mailing list On Mon, 21 Jun 2010 13:22:34 +0100 Alan Cox <alan@lxorguk.ukuu.org.uk> wrote: > > How do we go forward from here? > > PM QoS seems to be evolving nicely. Still, it doesn't solve the userspace-part. Cheers, Flo ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend 2010-06-21 12:22 ` Alan Cox 2010-06-21 12:26 ` Florian Mickler @ 2010-06-21 13:42 ` tytso 2010-06-21 14:01 ` Alan Cox 1 sibling, 1 reply; 75+ messages in thread From: tytso @ 2010-06-21 13:42 UTC (permalink / raw) To: Alan Cox Cc: markgross, Alan Stern, Rafael J. Wysocki, Linux-pm mailing list, Matthew Garrett, Linux Kernel Mailing List, Dmitry Torokhov, Arve Hjønnevåg, Neil Brown, mark gross On Mon, Jun 21, 2010 at 01:22:34PM +0100, Alan Cox wrote: > > You seem desperate to just throw it at Linus - you have been all along > before the discussion, during it and now: but I don't understand why ? Why are you so afraid to let Linus make the call? He's the benevolent dictator, isn't he? > It's as if you don't want progress to be made by other means ? It doesn't look like progress is being made here. - Ted ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend 2010-06-21 13:42 ` tytso @ 2010-06-21 14:01 ` Alan Cox 0 siblings, 0 replies; 75+ messages in thread From: Alan Cox @ 2010-06-21 14:01 UTC (permalink / raw) To: tytso Cc: markgross, Alan Stern, Rafael J. Wysocki, Linux-pm mailing list, Matthew Garrett, Linux Kernel Mailing List, Dmitry Torokhov, Arve Hjønnevåg, Neil Brown, mark gross On Mon, 21 Jun 2010 09:42:49 -0400 tytso@mit.edu wrote: > On Mon, Jun 21, 2010 at 01:22:34PM +0100, Alan Cox wrote: > > > > You seem desperate to just throw it at Linus - you have been all along > > before the discussion, during it and now: but I don't understand why ? > > Why are you so afraid to let Linus make the call? He's the benevolent > dictator, isn't he? I don't mind him laughing at it. I'm just curious why you've tried to short circuit the more general discussion repeatedly - even right at the start. > It doesn't look like progress is being made here. On suspend blockers no - on some of the real problems yes. ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend 2010-06-21 12:10 ` tytso 2010-06-21 12:22 ` Alan Cox @ 2010-06-22 1:07 ` mark gross 1 sibling, 0 replies; 75+ messages in thread From: mark gross @ 2010-06-22 1:07 UTC (permalink / raw) To: tytso, markgross, Alan Stern, Rafael J. Wysocki, Linux-pm mailing list, Matthew Garrett, Linux Kernel Mailing List, Dmitry Torokhov, Arve Hjønnevåg, Neil Brown, mark gross On Mon, Jun 21, 2010 at 08:10:58AM -0400, tytso@mit.edu wrote: > So where are we at this point? The patches are in James Bottomly's tree. > Discussion had completely died down for a while, and it's picked up > again, but it's not clear to me that we're any closer to reaching > consensus. I thought we (linux community and Android) where ok with the plist / pm-qos implementation of the building blocks needed to implement the suspend blocker feature on top of a pm-qos request class (I think the name was "interactive") pretty much the exact same symantecs as the suspend blocker thing, just with pm-qos kernel api's. > There's been one proposal that we simply merge in a set of no-op > inline functions for suspend blockers, just so we can get let the > drivers go in (assuming that Greg K-H believes this is still a > problem), but with an automatical removal of N months (N to be > decided, say 9 or 12 or 18 months). I'd rather see the re-tooling of pmqos happen. > > My concern is that if we do that, we will have simply kicked the ball > down the road for N months. Another approach is to simply merge in > no-op functions and not leave any kind of deprecation schedule. > That's sort of an implicit admission of the fact that we may not reach > consensus on this issue. Or we could simply ship the patches as-is to > Linus after he gets back from vacation and ask him for a thumbs up or > thumbs down vote, which might settle things once and for all. > > How do we go forward from here? > put the pm_qos -plist update into linux-next? --mgross ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend 2010-06-21 6:13 ` mark gross 2010-06-21 12:10 ` tytso @ 2010-06-21 16:01 ` Alan Stern 2010-06-22 1:25 ` mark gross 1 sibling, 1 reply; 75+ messages in thread From: Alan Stern @ 2010-06-21 16:01 UTC (permalink / raw) To: markgross Cc: Rafael J. Wysocki, Linux-pm mailing list, Matthew Garrett, Linux Kernel Mailing List, Dmitry Torokhov, Arve Hjønnevåg, Neil Brown, mark gross On Sun, 20 Jun 2010, mark gross wrote: > Your confused about what problem this patch attempts to solve. I don't think so. Rafael's description was pretty clear. > There is > a pm_qos patch in the works to address the suspend blocker > functionality. > http://lists.linux-foundation.org/pipermail/linux-pm/2010-June/026760.html No. That patch addresses something _similar_ to the suspend blocker functionality. The fact remains, though, that pm_qos is not used during system suspend (the /sys/power/state interface), hence changes to pm_qos won't solve the system-suspend problems that suspend blockers do solve. Alan Stern ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend 2010-06-21 16:01 ` Alan Stern @ 2010-06-22 1:25 ` mark gross 2010-06-22 2:24 ` Alan Stern 0 siblings, 1 reply; 75+ messages in thread From: mark gross @ 2010-06-22 1:25 UTC (permalink / raw) To: Alan Stern Cc: markgross, Rafael J. Wysocki, Linux-pm mailing list, Matthew Garrett, Linux Kernel Mailing List, Dmitry Torokhov, Arve Hjønnevåg, Neil Brown, mark gross On Mon, Jun 21, 2010 at 12:01:09PM -0400, Alan Stern wrote: > On Sun, 20 Jun 2010, mark gross wrote: > > > Your confused about what problem this patch attempts to solve. > > I don't think so. Rafael's description was pretty clear. Then how is it you don't understand the fact that Rafael's patch is to solve the wake event notification suspend race and not block opertunistic suspends or kernel critical sections where suspending should be disabled? > > > There is > > a pm_qos patch in the works to address the suspend blocker > > functionality. > > http://lists.linux-foundation.org/pipermail/linux-pm/2010-June/026760.html > > No. That patch addresses something _similar_ to the suspend blocker > functionality. The fact remains, though, that pm_qos is not used > during system suspend (the /sys/power/state interface), hence changes > to pm_qos won't solve the system-suspend problems that suspend blockers > do solve. You keep saying they solve something, I keep wondering what you are talking aobut. Lets see what problems it solves: * implements oppertunistic suspending (this is a feature not a problem) * enables kernel critical sections blocking suspending. * requiers overlapping application specific critcal sections from ISR into user mode to make implementation correct. * exposes a user mode interface to set a critical section. * reduces races between wake events (or suspend blocking events) but I'm not convinced it solves them. suspend blockers provide a way to block oppertunistic suspending, wich I'll have you know, is a pain to get working right and the enabling from device to device is not very portable and *that* doesn't say good things about the scheme. --mgross ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend 2010-06-22 1:25 ` mark gross @ 2010-06-22 2:24 ` Alan Stern 0 siblings, 0 replies; 75+ messages in thread From: Alan Stern @ 2010-06-22 2:24 UTC (permalink / raw) To: markgross Cc: Rafael J. Wysocki, Linux-pm mailing list, Matthew Garrett, Linux Kernel Mailing List, Dmitry Torokhov, Arve Hjønnevåg, Neil Brown, mark gross On Mon, 21 Jun 2010, mark gross wrote: > On Mon, Jun 21, 2010 at 12:01:09PM -0400, Alan Stern wrote: > > On Sun, 20 Jun 2010, mark gross wrote: > > > > > Your confused about what problem this patch attempts to solve. > > > > I don't think so. Rafael's description was pretty clear. > > Then how is it you don't understand the fact that Rafael's patch is to > solve the wake event notification suspend race and not block opertunistic > suspends or kernel critical sections where suspending should be disabled? I don't know what gave you the idea that I think Rafael's patch is meant to block kernel critical sections. I certainly don't think that. However leaving that aside, the rest of the above is just two different ways of saying the same thing: Wakeup events should cause suspend to be disabled until the events are processed. Arrival of wakeup events races with initiation of system suspend (whether opportunistic or not). Therefore blocking suspends when they ought to be disabled requires us to solve the wake event/suspend race. You can't do one without doing the other. > > > There is > > > a pm_qos patch in the works to address the suspend blocker > > > functionality. > > > http://lists.linux-foundation.org/pipermail/linux-pm/2010-June/026760.html > > > > No. That patch addresses something _similar_ to the suspend blocker > > functionality. The fact remains, though, that pm_qos is not used > > during system suspend (the /sys/power/state interface), hence changes > > to pm_qos won't solve the system-suspend problems that suspend blockers > > do solve. > > You keep saying they solve something, I keep wondering what you are > talking aobut. > Lets see what problems it solves: > * implements oppertunistic suspending (this is a feature not a problem) > * enables kernel critical sections blocking suspending. > * requiers overlapping application specific critcal sections from ISR > into user mode to make implementation correct. > * exposes a user mode interface to set a critical section. > * reduces races between wake events (or suspend blocking events) but I'm > not convinced it solves them. The last item on your list is what I meant. The others are not problems solved by suspend blockers. Well, maybe the second is, but I don't know of any kernel critical sections that need to block suspend other than those caused by wakeup events. I agree with you that bundling opportunistic suspend and the user mode interface together with suspend blockers made the situation more difficult by mixing up the important issues. > suspend blockers provide a way to block oppertunistic suspending, wich > I'll have you know, is a pain to get working right and the enabling from > device to device is not very portable and *that* doesn't say good things > about the scheme. The real problem with the scheme is to define which events should count as "wakeup" events and to determine when they have been fully handled. I can see that these might well vary from one platform to another. (Maybe that's what you mean by saying the enabling is not very portable.) But these issues are unavoidable; any scheme will have to address them. Alan Stern ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend 2010-06-21 2:23 ` Alan Stern 2010-06-21 5:32 ` Florian Mickler 2010-06-21 6:13 ` mark gross @ 2010-06-21 21:58 ` Rafael J. Wysocki 2 siblings, 0 replies; 75+ messages in thread From: Rafael J. Wysocki @ 2010-06-21 21:58 UTC (permalink / raw) To: Alan Stern Cc: Linux-pm mailing list, Matthew Garrett, Linux Kernel Mailing List, Dmitry Torokhov, Arve Hjønnevåg, Neil Brown, mark gross On Monday, June 21, 2010, Alan Stern wrote: > On Sun, 20 Jun 2010, Rafael J. Wysocki wrote: > > > > > Generally, there are two problems in that area. First, if a wakeup event > > > > occurs exactly at the same time when /sys/power/state is being written to, > > > > the even may be delivered to user space right before the freezing of it, > > > > in which case the user space consumer of the event may not be able to process > > > > it before the system is suspended. > > > > > > Indeed, the same problem arises if the event isn't delivered to > > > userspace until after userspace is frozen. > > > > In that case the kernel should abort the suspend so that the event can be > > delivered to the user space. > > Yes. > > > > Of course, the underlying issue here is that the kernel has no direct way > > > to know when userspace has finished processing an event. Userspace would > > > have to tell it, which generally would mean rewriting some large number of user > > > programs. > > > > I'm not sure of that. If the kernel doesn't initiate suspend, it doesn't > > really need to know whether or not user space has already consumed the event. > > That's true. But it only shifts the onus: When a userspace program has > finished processing an event, it has to tell the power-manager process. > Clearly this sort of thing is unavoidable, one way or another. That's correct, but there already are multiple ways to communicate things between user space processes, while we would need new interfaces to pass that information between user space and the kernel. That makes quite some difference. > > > > The following patch illustrates my idea of how these two problems may be > > > > addressed. It introduces a new global sysfs attribute, > > > > /sys/power/wakeup_count, associated with a running counter of wakeup events > > > > and a helper function, pm_wakeup_event(), that may be used by kernel subsystems > > > > to increment the wakeup events counter. > > > > > > In what way is this better than suspend blockers? > > > > It doesn't add any new framework and it doesn't require the users of > > pm_wakeup_event() to "unblock" suspend, so it is simpler. It also doesn't add > > the user space interface that caused so much opposition to appear. > > Okay. A quick comparison shows that in your proposal: > > There's no need to register and unregister suspend blockers. > But instead you create the equivalent of a suspend blocker > inside every struct device. Not really. That thing is for statistics purposes only and it's not essential from the functionality POV. But you know that. :-) > Drivers (or subsystems) don't have to activate suspend > blockers. But instead they have to call pm_wakeup_event(). > > Drivers don't have to deactivate suspend blockers. You don't > have anything equivalent, and as a result your scheme is > subject to the race described below. The original one without user space variant of suspend blockers is subject to it as well, because the kernel doesn't know when the user space consumer is going to finish processing the event. Rafael ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend 2010-06-20 16:28 ` Alan Stern 2010-06-20 21:50 ` Rafael J. Wysocki @ 2010-06-20 22:58 ` mark gross 2010-06-21 2:33 ` Alan Stern 1 sibling, 1 reply; 75+ messages in thread From: mark gross @ 2010-06-20 22:58 UTC (permalink / raw) To: Alan Stern Cc: Rafael J. Wysocki, linux-pm, Matthew Garrett, Linux Kernel Mailing List, Dmitry Torokhov, Arve Hjønnevåg, Neil Brown, mark gross On Sun, Jun 20, 2010 at 12:28:29PM -0400, Alan Stern wrote: > On Sun, 20 Jun 2010, Rafael J. Wysocki wrote: > > > Hi, > > > > One of the arguments during the suspend blockers discussion was that the > > mainline kernel didn't contain any mechanisms allowing it to avoid losing > > wakeup events during system suspend. > > > > Generally, there are two problems in that area. First, if a wakeup event > > occurs exactly at the same time when /sys/power/state is being written to, > > the even may be delivered to user space right before the freezing of it, > > in which case the user space consumer of the event may not be able to process > > it before the system is suspended. > > Indeed, the same problem arises if the event isn't delivered to > userspace until after userspace is frozen. Of course, the underlying > issue here is that the kernel has no direct way to know when userspace > has finished processing an event. Userspace would have to tell it, > which generally would mean rewriting some large number of user > programs. Suspend blockers don't solve this, and the large number of user programs isn't a big number. /me thinks its 1 or 2 services. > > Second, if a wakeup event occurs after user > > space has been frozen and that event is not a wakeup interrupt, the kernel will > > not react to it and the system will be suspended. > > I don't quite understand what you mean here. "Reacting" to an event > involves more than one action. The kernel has to tell the hardware to > stop generating the wakeup signal, and it has to handle the event > somehow. > > If the kernel doesn't tell the hardware to stop generating the wakeup > signal, the signal will continue to be active until the system goes to > sleep. At that point it will cause the system to wake up immediately, > so there won't be any problem. > > The real problem arises when the hardware stops generating the wakeup > signal but the kernel suspends before it finishes handling the event. > For example, an interrupt handler might receive the event and start > processing it by calling pm_request_resume() -- but if the pm workqueue > thread is already frozen then the processing won't finish until > something else wakes the system up. (IMO this is a potential bug which > could be fixed without too much effort.) > > > The following patch illustrates my idea of how these two problems may be > > addressed. It introduces a new global sysfs attribute, > > /sys/power/wakeup_count, associated with a running counter of wakeup events > > and a helper function, pm_wakeup_event(), that may be used by kernel subsystems > > to increment the wakeup events counter. > > In what way is this better than suspend blockers? 1) I don't think suspend blockers really solve or effectively articulate the suspend wake event race problem. 2) the partial solution suspend blocker offer for the suspend race is tightly bound to the suspend blocker implementation and not useful in more general contexts. > > /sys/power/wakeup_count may be read from or written to by user space. Reads > > will always succeed and return the current value of the wakeup events counter. > > Writes, however, will only succeed if the written number is equal to the > > current value of the wakeup events counter. If a write is successful, it will > > cause the kernel to save the current value of the wakeup events counter and > > to compare the saved number with the current value of the counter at certain > > points of the subsequent suspend (or hibernate) sequence. If the two values > > don't match, the suspend will be aborted just as though a wakeup interrupt > > happened. Reading from /sys/power/wakeup_count again will turn that mechanism > > off. > > > > The assumption is that there's a user space power manager that will first > > read from /sys/power/wakeup_count. Then it will check all user space consumers > > of wakeup events known to it for unprocessed events. > > What happens if an event arrives just before you read > /sys/power/wakeup_count, but the userspace consumer doesn't realize > there is a new unprocessed event until after the power manager checks > it? Your plan is missing a critical step: the "handoff" whereby > responsibility for handling an event passes from the kernel to > userspace. > > With suspend blockers, this handoff occurs when an event queue is > emptied and its associate suspend blocker is deactivated. Or with some > kinds of events for which the Android people have not written an > explicit handoff, it occurs when a timer expires (timed suspend > blockers). The wakeup_count will need to be incremented in the same even queue the suspend blocker handoff happens. > > > If there are any, it will > > wait for them to be processed and repeat. In turn, if there are not any, > > it will try to write to /sys/power/wakeup_count and if the write is successful, > > it will write to /sys/power/state to start suspend, so if any wakeup events > > accur past that point, they will be noticed by the kernel and will eventually > > cause the suspend to be aborted. > > This shares with the other alternatives posted recently the need for a > central power-manager process. And like in-kernel suspend blockers, it > requires changes to wakeup-capable drivers (the wakeup-events counter > has to be incremented). true. > One advantage of the suspend-blocker approach is that it essentially > uses a single tool to handle both kinds of races (event not fully > handled by the kernel, or event not fully handled by userspace). > Things aren't quite this simple, because of the need for a special API > to implement userspace suspend blockers, but this does avoid the need > for a power-manager process. I don't think suspend-blocker handles both kinds of races as well as you seem to think. A single tool that conflates multiple kernel facilities is not an advantage. It limits applications outside of the scope of doing it the "android way". Where do you get the idea that there isn't a need for a centralized PM process in Android? Thats not true. > > In addition to the above, the patch adds a wakeup events counter to the > > power member of struct device and makes these per-device wakeup event counters > > available via sysfs, so that it's possible to check the activity of various > > wakeup event sources within the kernel. > > > > To illustrate how subsystems can use pm_wakeup_event(), I added it to the > > PCI runtime PM wakeup-handling code. > > > > At the moment the patch only contains code changes (ie. no documentation), > > but I'm going to add comments etc. if people like the idea. > > > > Please tell me what you think. > > While this isn't a bad idea, I don't see how it is superior to the > other alternatives that have been proposed. > I like my PM-event framework better but I think this is ok too. --mgross ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend 2010-06-20 22:58 ` mark gross @ 2010-06-21 2:33 ` Alan Stern 2010-06-21 4:04 ` [linux-pm] " David Brownell 2010-06-21 5:55 ` mark gross 0 siblings, 2 replies; 75+ messages in thread From: Alan Stern @ 2010-06-21 2:33 UTC (permalink / raw) To: markgross Cc: Rafael J. Wysocki, linux-pm, Matthew Garrett, Linux Kernel Mailing List, Dmitry Torokhov, Arve Hjønnevåg, Neil Brown, mark gross On Sun, 20 Jun 2010, mark gross wrote: > > Indeed, the same problem arises if the event isn't delivered to > > userspace until after userspace is frozen. Of course, the underlying > > issue here is that the kernel has no direct way to know when userspace > > has finished processing an event. Userspace would have to tell it, > > which generally would mean rewriting some large number of user > > programs. > > Suspend blockers don't solve this, and the large number of user programs > isn't a big number. /me thinks its 1 or 2 services. On Android, perhaps. What about on other systems? > > In what way is this better than suspend blockers? > > 1) I don't think suspend blockers really solve or effectively articulate > the suspend wake event race problem. Why not? > 2) the partial solution suspend blocker offer for the suspend race is > tightly bound to the suspend blocker implementation and not useful in > more general contexts. I don't understand. Can you explain further? > > One advantage of the suspend-blocker approach is that it essentially > > uses a single tool to handle both kinds of races (event not fully > > handled by the kernel, or event not fully handled by userspace). > > Things aren't quite this simple, because of the need for a special API > > to implement userspace suspend blockers, but this does avoid the need > > for a power-manager process. > > I don't think suspend-blocker handles both kinds of races as well as you > seem to think. Can you give any counterexamples? > A single tool that conflates multiple kernel facilities > is not an advantage. It limits applications outside of the scope of > doing it the "android way". I don't see that necessarily as a drawback. You might just as well say that the entire Linux kernel limits applications to doing things the "Unix" way. Often have fewer choices is an advantage. > Where do you get the idea that there isn't a need for a centralized PM > process in Android? Thats not true. I didn't get that idea from anywhere. Sorry if I gave the wrong impression -- I meant that non-Android systems would need to implement a centralized power-manager process, along with all the necessary changes to other programs. (Incidentally, even on Android the centralized PM process does not handle _all_ of the userspace/kernel wakelock interactions.) Alan Stern ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [linux-pm] [RFC][PATCH] PM: Avoid losing wakeup events during suspend 2010-06-21 2:33 ` Alan Stern @ 2010-06-21 4:04 ` David Brownell 2010-06-21 6:02 ` David Brownell 2010-06-21 15:06 ` Alan Stern 2010-06-21 5:55 ` mark gross 1 sibling, 2 replies; 75+ messages in thread From: David Brownell @ 2010-06-21 4:04 UTC (permalink / raw) To: markgross, Alan Stern Cc: mark gross, Neil Brown, Dmitry Torokhov, Linux Kernel Mailing List, linux-pm > > > Indeed, the same problem arises if the event > isn't delivered to > > > userspace until after userspace is frozen. Can we put this more directly: the problem is that the *SYSTEM ISN'T FULLY SUSPENDED* when the hardware wake event triggers? (Where "*SYSTEM* includes userspace not just kernel. In fact the overall system is built from many subsystems, some in the kernel and some in userspace. At the risk of being prematurely general: I'd point out that these subsystems probably have sequencing requirements. kernel-then-user is a degenerate case, and surely oversimplified. There are other examples, e.g. between kernel subsystems... Like needing to suspend a PMIC before the bus it uses, where that bus uses a task to manage request/response protocols. (Think I2C or SPI.) This is like the __init/__exit sequencing mess... In terms of userspace event delivery, I'd say it's a bug in the event mechanism if taking the next step in suspension drops any event. It should be queued, not lost... As a rule the hardware queuing works (transparently)... > Of course, the underlying > > > issue here is that the kernel has no direct way > to know when userspace > > > has finished processing an event. Again said more directly: there's no current mechanism to coordinate subsystems. Userspace can't communicate "I'm ready" to kernel, and vice versa. (a few decades ago, APM could do that ... we dropped such mechanisms though, and I'm fairly sure APM's implementation was holey.) ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [linux-pm] [RFC][PATCH] PM: Avoid losing wakeup events during suspend 2010-06-21 4:04 ` [linux-pm] " David Brownell @ 2010-06-21 6:02 ` David Brownell 2010-06-21 15:06 ` Alan Stern 1 sibling, 0 replies; 75+ messages in thread From: David Brownell @ 2010-06-21 6:02 UTC (permalink / raw) To: markgross, Alan Stern Cc: Neil Brown, linux-pm, Dmitry Torokhov, Linux Kernel Mailing List, mark gross --- On Sun, 6/20/10, David Brownell <david-b@pacbell.net> wrote: ... in a sort of "aren't we asking the wrong questions??" manner ... I suspect that looking at the problem in terms of how to coordinate subsystems (an abstraction which is at best very ad-hoc today!) we would end up with a cleaner model, which doesn't bother so many folk the ay wakelocks or even suspend blockers seem to bother them... > From: David Brownell <david-b@pacbell.net> > Subject: Re: [linux-pm] [RFC][PATCH] PM: Avoid losing wakeup events during suspend > To: markgross@thegnar.org, "Alan Stern" <stern@rowland.harvard.edu> > Cc: "Neil Brown" <neilb@suse.de>, linux-pm@lists.linux-foundation.org, "Dmitry Torokhov" <dmitry.torokhov@gmail.com>, "Linux Kernel Mailing List" <linux-kernel@vger.kernel.org>, "mark gross" <640e9920@gmail.com> > Date: Sunday, June 20, 2010, 9:04 PM > > > > > Indeed, the same problem arises if the > event > > isn't delivered to > > > > userspace until after userspace is frozen. > > Can we put this more directly: the problem is > that the *SYSTEM ISN'T FULLY SUSPENDED* when the > hardware wake event triggers? (Where "*SYSTEM* > includes userspace not just kernel. In fact the > overall system is built from many subsystems, > some in the kernel and some in userspace. > > At the risk of being prematurely general: I'd > point out that these subsystems probably have > sequencing requirements. kernel-then-user is > a degenerate case, and surely oversimplified. > There are other examples, e.g. between kernel > subsystems... Like needing to suspend a PMIC > before the bus it uses, where that bus uses > a task to manage request/response protocols. > (Think I2C or SPI.) > > This is like the __init/__exit sequencing mess... > > In terms of userspace event delivery, I'd say > it's a bug in the event mechanism if taking the > next step in suspension drops any event. It > should be queued, not lost... As a rule the > hardware queuing works (transparently)... > > > Of course, the underlying > > > > issue here is that the kernel has no direct > way > > to know when userspace > > > > has finished processing an event. > > > Again said more directly: there's no current > mechanism to coordinate subsystems. Userspace > can't communicate "I'm ready" to kernel, and > vice versa. (a few decades ago, APM could do > that ... we dropped such mechanisms though, and > I'm fairly sure APM's implementation was holey.) > > > > > _______________________________________________ > linux-pm mailing list > linux-pm@lists.linux-foundation.org > https://lists.linux-foundation.org/mailman/listinfo/linux-pm > ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [linux-pm] [RFC][PATCH] PM: Avoid losing wakeup events during suspend 2010-06-21 4:04 ` [linux-pm] " David Brownell 2010-06-21 6:02 ` David Brownell @ 2010-06-21 15:06 ` Alan Stern 1 sibling, 0 replies; 75+ messages in thread From: Alan Stern @ 2010-06-21 15:06 UTC (permalink / raw) To: David Brownell Cc: Florian Mickler, markgross, mark gross, Neil Brown, Dmitry Torokhov, Linux Kernel Mailing List, Linux-pm mailing list On Sun, 20 Jun 2010, David Brownell wrote: > Can we put this more directly: the problem is > that the *SYSTEM ISN'T FULLY SUSPENDED* when the > hardware wake event triggers? (Where "*SYSTEM* > includes userspace not just kernel. In fact the > overall system is built from many subsystems, > some in the kernel and some in userspace. Indeed, the system may not even be partially suspended when the wake event triggers. > At the risk of being prematurely general: I'd > point out that these subsystems probably have > sequencing requirements. kernel-then-user is > a degenerate case, and surely oversimplified. > There are other examples, e.g. between kernel > subsystems... Like needing to suspend a PMIC > before the bus it uses, where that bus uses > a task to manage request/response protocols. > (Think I2C or SPI.) > > This is like the __init/__exit sequencing mess... > > In terms of userspace event delivery, I'd say > it's a bug in the event mechanism if taking the > next step in suspension drops any event. It > should be queued, not lost... As a rule the > hardware queuing works (transparently)... There may be a misunderstanding here... People talk about events getting lost, but what they (usually) mean is that the event isn't actually _dropped_ -- rather, it fails to trigger a wakeup or to prevent a suspend. When something else causes the system to resume later on, the event will be delivered normally. This means that the problem is not one of sequencing. The problem is twofold: To recognize when a wakeup event has occurred and therefore it is not now safe to allow the system to suspend; And to recognize when a wakeup event has been completely handled and therefore it is once again safe to allow the system to suspend. > > Of course, the underlying > > > > issue here is that the kernel has no direct way > > to know when userspace > > > > has finished processing an event. > > > Again said more directly: there's no current > mechanism to coordinate subsystems. Userspace > can't communicate "I'm ready" to kernel, and > vice versa. (a few decades ago, APM could do > that ... we dropped such mechanisms though, and > I'm fairly sure APM's implementation was holey.) Yes, that's a better way of putting it. And it's not just a matter of "userspace communicating with the kernel", because userspace is not monolithic. There has to be a way for one user process to communicate this information to another (I like Florian's idea). Of course, the kernel doesn't have to worry about those details. If one accepts a scheme in which all the suspend initiations and cancellations are carried out by a single process (a power-manager process), then the difficulties of communication and coordination between the kernel and userspace are minimized. Alan Stern ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend 2010-06-21 2:33 ` Alan Stern 2010-06-21 4:04 ` [linux-pm] " David Brownell @ 2010-06-21 5:55 ` mark gross 2010-06-21 12:39 ` Florian Mickler 2010-06-21 15:57 ` Alan Stern 1 sibling, 2 replies; 75+ messages in thread From: mark gross @ 2010-06-21 5:55 UTC (permalink / raw) To: Alan Stern Cc: markgross, Rafael J. Wysocki, linux-pm, Matthew Garrett, Linux Kernel Mailing List, Dmitry Torokhov, Arve Hjønnevåg, Neil Brown, mark gross On Sun, Jun 20, 2010 at 10:33:01PM -0400, Alan Stern wrote: > On Sun, 20 Jun 2010, mark gross wrote: > > > > Indeed, the same problem arises if the event isn't delivered to > > > userspace until after userspace is frozen. Of course, the underlying > > > issue here is that the kernel has no direct way to know when userspace > > > has finished processing an event. Userspace would have to tell it, > > > which generally would mean rewriting some large number of user > > > programs. > > > > Suspend blockers don't solve this, and the large number of user programs > > isn't a big number. /me thinks its 1 or 2 services. > > On Android, perhaps. What about on other systems? This is just speculation but, I would expect other systems would have the graphics, input, telephony, and any PM services wake_count / wake event ware. I can't imagine (at the moment) anything else that would care. > > > > In what way is this better than suspend blockers? > > > > 1) I don't think suspend blockers really solve or effectively articulate > > the suspend wake event race problem. > > Why not? For starters the suspend block patch is about suspend blocking, at least at first before competing ideas starting coming out and this race issue was offered up as an excuse to not consider the alternative ideas, then suddenly suspend blocking became a wake event notification mechanism implementation that was baked into the implementation. The arguments are still a blur and irritating to recall / look up again. But, the point I'm trying to make is that wake event serialization / event handling suddenly became a big FUD-pie tightly bound to a specific feature for suspend blocking (or was is auto suspending? Its a magical solution to all the PM problems in the kernel. I'm being sarcastic.) Its much better to call out the issue and provide a clean targeted solution to it without binding it to some other solution to a different problem. > > > 2) the partial solution suspend blocker offer for the suspend race is > > tightly bound to the suspend blocker implementation and not useful in > > more general contexts. > > I don't understand. Can you explain further? > > > > One advantage of the suspend-blocker approach is that it essentially > > > uses a single tool to handle both kinds of races (event not fully > > > handled by the kernel, or event not fully handled by userspace). > > > Things aren't quite this simple, because of the need for a special API > > > to implement userspace suspend blockers, but this does avoid the need > > > for a power-manager process. > > > > I don't think suspend-blocker handles both kinds of races as well as you > > seem to think. > > Can you give any counterexamples? I knew you'd ask such a thing. Do you have any correctness proofs of it working right? Lets consider the RIL incoming call race: 1) user mode initiates an "opportunistic suspend" or whatever we call this these days by writing to some sys interface. 2) a call comes in (multi-threaded cpu) just after the write. 3) the caif driver grabs a blocker, stopping the in flight suspend in the nick of time. But, when should it release it without racing? How does one implement that kernel code such that its portable to non-android user mode stacks? > > > A single tool that conflates multiple kernel facilities > > is not an advantage. It limits applications outside of the scope of > > doing it the "android way". > > I don't see that necessarily as a drawback. You might just as well say > that the entire Linux kernel limits applications to doing things the > "Unix" way. Often have fewer choices is an advantage. I disagree with your analogy. One problem one solution with minimal coupling to other implementation details is nice. Two problems with one solution dependent on the system wide architecture bound to a user mode PM design is fragile and not portable. > > Where do you get the idea that there isn't a need for a centralized PM > > process in Android? Thats not true. > > I didn't get that idea from anywhere. Sorry if I gave the wrong > impression -- I meant that non-Android systems would need to implement > a centralized power-manager process, along with all the necessary > changes to other programs. > > (Incidentally, even on Android the centralized PM process does not > handle _all_ of the userspace/kernel wakelock interactions.) > Yeah, I've been told that too. I've grepped for where the wake_unlock sysfs interfaces are hit from user mode android (eclair) and I would like some help in identifying those guys. Maybe its in the FroYo code I don't get to look at yet? There is libhardware_legacy/power/power.c and an out of tree kernel broadcom bcm4329 driver under hardware/broadcom but that doesn't count as it looks like a kernel driver to me. --mgross ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend 2010-06-21 5:55 ` mark gross @ 2010-06-21 12:39 ` Florian Mickler 2010-06-21 15:57 ` Alan Stern 1 sibling, 0 replies; 75+ messages in thread From: Florian Mickler @ 2010-06-21 12:39 UTC (permalink / raw) To: markgross Cc: 640e9920, Alan Stern, Rafael J. Wysocki, linux-pm, Matthew Garrett, Linux Kernel Mailing List, Dmitry Torokhov, Arve Hjønnevåg, Neil Brown On Sun, 20 Jun 2010 22:55:22 -0700 mark gross <640e9920@gmail.com> wrote: > On Sun, Jun 20, 2010 at 10:33:01PM -0400, Alan Stern wrote: > > On Sun, 20 Jun 2010, mark gross wrote: > > > > In what way is this better than suspend blockers? > > > > > > 1) I don't think suspend blockers really solve or effectively articulate > > > the suspend wake event race problem. > > > > Why not? > > For starters the suspend block patch is about suspend blocking, at least > at first before competing ideas starting coming out and this race issue > was offered up as an excuse to not consider the alternative ideas, then > suddenly suspend blocking became a wake event notification mechanism > implementation that was baked into the implementation. The arguments > are still a blur and irritating to recall / look up again. I really can't follow you here. Userspace and kernelspace actively blocking suspend when necessary guarantees that the race is won by the right party. I.e. the race problem is solved. You can view it as a wake event notification if you want. But that's just another point of view. > > But, the point I'm trying to make is that wake event serialization / > event handling suddenly became a big FUD-pie tightly bound to a specific > feature for suspend blocking (or was is auto suspending? Its a magical > solution to all the PM problems in the kernel. I'm being sarcastic.) > > Its much better to call out the issue and provide a clean targeted > solution to it without binding it to some other solution to a different > problem. See my other mail in response to Alan Stern. In the case of a userspace-suspend-manager, if we just use this wake-event-counting mechanism, we have to make shure userspace get's scheduled and notifies the suspend-manager with an 'all clear' before it suspends the machine. Else it can happen for the userspace-manager to suspend before userspace could notify the userspace-manager to not suspend. Cheers, Flo ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend 2010-06-21 5:55 ` mark gross 2010-06-21 12:39 ` Florian Mickler @ 2010-06-21 15:57 ` Alan Stern 2010-06-22 1:58 ` mark gross 1 sibling, 1 reply; 75+ messages in thread From: Alan Stern @ 2010-06-21 15:57 UTC (permalink / raw) To: markgross Cc: Rafael J. Wysocki, linux-pm, Matthew Garrett, Linux Kernel Mailing List, Dmitry Torokhov, Arve Hjønnevåg, Neil Brown, mark gross On Sun, 20 Jun 2010, mark gross wrote: > > > 1) I don't think suspend blockers really solve or effectively articulate > > > the suspend wake event race problem. > > > > Why not? > > For starters the suspend block patch is about suspend blocking, at least > at first before competing ideas starting coming out and this race issue > was offered up as an excuse to not consider the alternative ideas, then > suddenly suspend blocking became a wake event notification mechanism > implementation that was baked into the implementation. The arguments > are still a blur and irritating to recall / look up again. I get the feeling you didn't fully absorb the intent of the original wakelock patches. Right from the start they were about fixing a race in system suspend: The system may go to sleep even though a pending wakeup event should have blocked or prevented the suspend. In particular that means notifying the PM core about wakeup events, when they occur, and when the system may once again be allowed to suspend. The userspace parts of the original patches did cloud the issue, admittedly. But the purpose of the in-kernel parts has always been clear. > But, the point I'm trying to make is that wake event serialization / > event handling suddenly became a big FUD-pie tightly bound to a specific > feature for suspend blocking (or was is auto suspending? Its a magical > solution to all the PM problems in the kernel. I'm being sarcastic.) > > Its much better to call out the issue and provide a clean targeted > solution to it without binding it to some other solution to a different > problem. That's exactly what the wakelock patches did: They called out the issue of the system suspending even while there were pending wakeup events, they provided a targeted solution, and it wasn't bound to another solution to a different problem. Indeed, if you go back through the (I agree, irritating) threads on this topic, you'll see that the FUD and other issues were all introduced by other kernel developers, mainly because they didn't like the idea of using system suspend as a mechanism for dynamic power management. > > > I don't think suspend-blocker handles both kinds of races as well as you > > > seem to think. > > > > Can you give any counterexamples? > > I knew you'd ask such a thing. Do you have any correctness proofs of it > working right? If you want, sure. But what you think "working right" means may not be the same as what I think. > Lets consider the RIL incoming call race: "RIL"? > 1) user mode initiates an "opportunistic suspend" or whatever we call > this these days by writing to some sys interface. Okay. > 2) a call comes in (multi-threaded cpu) just after the write. Right. Presumably we want the suspend to be delayed until the call can be handled by a user programm, yes? > 3) the caif driver grabs a blocker, stopping the in flight suspend in the > nick of time. But, when should it release it without racing? How does > one implement that kernel code such that its portable to non-android user > mode stacks? I don't know anything about the caif driver so I can't answer this question in detail. Here's the general idea: Somehow the kernel has to notify userspace that an incoming call has arrived. Whatever driver or subsystem sends that notification should release the suspend blocker once userspace indicates that the notification has been received (for example, by reading all the data in an input event queue). That's true for both Android and non-Android. In some cases there may not be any mechanism for userspace to tell the kernel when a notification has been received. For thoses cases, the Android people used timer-based blockers. This is not necessarily the best approach but they seem happy with it. Others might prefer to add an explicit "notification received" mechanism. Still others take the view that since suspends are initiated by userspace anyway, it's not necessary to tell the kernel when suspend is safe again. Just tell the user process responsible for initiating suspends. > > > A single tool that conflates multiple kernel facilities > > > is not an advantage. It limits applications outside of the scope of > > > doing it the "android way". > > > > I don't see that necessarily as a drawback. You might just as well say > > that the entire Linux kernel limits applications to doing things the > > "Unix" way. Often have fewer choices is an advantage. > > I disagree with your analogy. One problem one solution with minimal > coupling to other implementation details is nice. Two problems with one > solution dependent on the system wide architecture bound to a user mode > PM design is fragile and not portable. I assume the "two problems" you refer to are: telling the PM core that the kernel has a wakeup event in progress, and telling the PM core that userspace has a wakeup event in progress. To me these don't seem to be vastly different problems, and having a single solution for both doesn't seem out of line. The fact that this is bound to a user-mode PM design was inevitable, given the whole idea was to overcome weaknesses in the system suspend implementation and that implementation already is driven by userspace. > > (Incidentally, even on Android the centralized PM process does not > > handle _all_ of the userspace/kernel wakelock interactions.) > > > > Yeah, I've been told that too. I've grepped for where the wake_unlock > sysfs interfaces are hit from user mode android (eclair) and I would > like some help in identifying those guys. Maybe its in the FroYo code I > don't get to look at yet? > > There is libhardware_legacy/power/power.c and an out of tree kernel > broadcom bcm4329 driver under hardware/broadcom but that doesn't count > as it looks like a kernel driver to me. I don't know -- I have never looked at the Android userspace. No doubt Arve can provide some examples. Alan Stern ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend 2010-06-21 15:57 ` Alan Stern @ 2010-06-22 1:58 ` mark gross 2010-06-22 2:46 ` Alan Stern ` (2 more replies) 0 siblings, 3 replies; 75+ messages in thread From: mark gross @ 2010-06-22 1:58 UTC (permalink / raw) To: Alan Stern Cc: markgross, Rafael J. Wysocki, linux-pm, Matthew Garrett, Linux Kernel Mailing List, Dmitry Torokhov, Arve Hjønnevåg, Neil Brown, mark gross On Mon, Jun 21, 2010 at 11:57:21AM -0400, Alan Stern wrote: > On Sun, 20 Jun 2010, mark gross wrote: > > > > > 1) I don't think suspend blockers really solve or effectively articulate > > > > the suspend wake event race problem. > > > > > > Why not? > > > > For starters the suspend block patch is about suspend blocking, at least > > at first before competing ideas starting coming out and this race issue > > was offered up as an excuse to not consider the alternative ideas, then > > suddenly suspend blocking became a wake event notification mechanism > > implementation that was baked into the implementation. The arguments > > are still a blur and irritating to recall / look up again. > > I get the feeling you didn't fully absorb the intent of the original > wakelock patches. Right from the start they were about fixing a race > in system suspend: The system may go to sleep even though a pending wrong. They are about 1) adding opportunistic suspend, and 2) adding critical sections to block suspend. No race issues where ever talked about until alternative solutions where put up. > wakeup event should have blocked or prevented the suspend. In > particular that means notifying the PM core about wakeup events, when > they occur, and when the system may once again be allowed to suspend. > > The userspace parts of the original patches did cloud the issue, > admittedly. But the purpose of the in-kernel parts has always been > clear. > > > But, the point I'm trying to make is that wake event serialization / > > event handling suddenly became a big FUD-pie tightly bound to a specific > > feature for suspend blocking (or was is auto suspending? Its a magical > > solution to all the PM problems in the kernel. I'm being sarcastic.) > > > > Its much better to call out the issue and provide a clean targeted > > solution to it without binding it to some other solution to a different > > problem. > > That's exactly what the wakelock patches did: They called out the > issue of the system suspending even while there were pending wakeup > events, they provided a targeted solution, and it wasn't bound to > another solution to a different problem. > > Indeed, if you go back through the (I agree, irritating) threads on > this topic, you'll see that the FUD and other issues were all > introduced by other kernel developers, mainly because they didn't like > the idea of using system suspend as a mechanism for dynamic power > management. I am struck by how differently you are seeing things. > > > > > I don't think suspend-blocker handles both kinds of races as well as you > > > > seem to think. > > > > > > Can you give any counterexamples? > > > > I knew you'd ask such a thing. Do you have any correctness proofs of it > > working right? > > If you want, sure. But what you think "working right" means may not be > the same as what I think. > > > Lets consider the RIL incoming call race: > > "RIL"? You are not an android developer are you? RIL is the user mode Radio Interface Layer. > > 1) user mode initiates an "opportunistic suspend" or whatever we call > > this these days by writing to some sys interface. > > Okay. > > > 2) a call comes in (multi-threaded cpu) just after the write. > > Right. Presumably we want the suspend to be delayed until the call > can be handled by a user programm, yes? > > > 3) the caif driver grabs a blocker, stopping the in flight suspend in the > > nick of time. But, when should it release it without racing? How does > > one implement that kernel code such that its portable to non-android user > > mode stacks? > > I don't know anything about the caif driver so I can't answer this > question in detail. Here's the general idea: Somehow the kernel has to > notify userspace that an incoming call has arrived. Whatever driver or > subsystem sends that notification should release the suspend blocker > once userspace indicates that the notification has been received (for > example, by reading all the data in an input event queue). That's true > for both Android and non-Android. so tell me how user space indicates to the kernel object will ever know when its ok to release the blocker? then tell me how suspend blocker provide an elegant portable solution to this that works for multiple user mode stacks. > > In some cases there may not be any mechanism for userspace to tell the > kernel when a notification has been received. For thoses cases, the > Android people used timer-based blockers. This is not necessarily the > best approach but they seem happy with it. Others might prefer to add > an explicit "notification received" mechanism. the time-based blockers are not part of the latest patch set. > > Still others take the view that since suspends are initiated by > userspace anyway, it's not necessary to tell the kernel when suspend is > safe again. Just tell the user process responsible for initiating > suspends. > > > > > A single tool that conflates multiple kernel facilities > > > > is not an advantage. It limits applications outside of the scope of > > > > doing it the "android way". > > > > > > I don't see that necessarily as a drawback. You might just as well say > > > that the entire Linux kernel limits applications to doing things the > > > "Unix" way. Often have fewer choices is an advantage. > > > > I disagree with your analogy. One problem one solution with minimal > > coupling to other implementation details is nice. Two problems with one > > solution dependent on the system wide architecture bound to a user mode > > PM design is fragile and not portable. > > I assume the "two problems" you refer to are: telling the PM core that > the kernel has a wakeup event in progress, and telling the PM core that > userspace has a wakeup event in progress. To me these don't seem to be > vastly different problems, and having a single solution for both > doesn't seem out of line. nope. problem 1) opportunistic enable suspend deferral for critical sections when suspending is "bad" problem 2) race between entering pm_suspend call tree and wake events. > The fact that this is bound to a user-mode PM design was inevitable, > given the whole idea was to overcome weaknesses in the system suspend > implementation and that implementation already is driven by userspace. I think we can do better. > > > (Incidentally, even on Android the centralized PM process does not > > > handle _all_ of the userspace/kernel wakelock interactions.) > > > > > > > Yeah, I've been told that too. I've grepped for where the wake_unlock > > sysfs interfaces are hit from user mode android (eclair) and I would > > like some help in identifying those guys. Maybe its in the FroYo code I > > don't get to look at yet? > > > > There is libhardware_legacy/power/power.c and an out of tree kernel > > broadcom bcm4329 driver under hardware/broadcom but that doesn't count > > as it looks like a kernel driver to me. > > I don't know -- I have never looked at the Android userspace. No doubt > Arve can provide some examples. who are you? You don't know about the Android stack, and you keep blowing about how suspend blockers solve all the PM problems? I'm starting to think your just trolling. --mgross ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend 2010-06-22 1:58 ` mark gross @ 2010-06-22 2:46 ` Alan Stern 2010-06-22 9:24 ` Rafael J. Wysocki 2010-06-22 6:18 ` Florian Mickler 2010-06-22 9:29 ` Rafael J. Wysocki 2 siblings, 1 reply; 75+ messages in thread From: Alan Stern @ 2010-06-22 2:46 UTC (permalink / raw) To: markgross Cc: Rafael J. Wysocki, linux-pm, Matthew Garrett, Linux Kernel Mailing List, Dmitry Torokhov, Arve Hjønnevåg, Neil Brown, mark gross On Mon, 21 Jun 2010, mark gross wrote: > > I get the feeling you didn't fully absorb the intent of the original > > wakelock patches. Right from the start they were about fixing a race > > in system suspend: The system may go to sleep even though a pending > > wrong. They are about 1) adding opportunistic suspend, and 2) adding > critical sections to block suspend. > > No race issues where ever talked about until alternative solutions > where put up. That's not how I remember it. But this point isn't important enough to be worth digging through the old emails to verify or disprove it. > I am struck by how differently you are seeing things. Those discussions were (and still are!) so long and annoying, it wouldn't be surprising if _everybody_ saw them differently! :-) You certainly must agree that a lot of differing personal viewpoints were expressed. > > > Lets consider the RIL incoming call race: > > > > "RIL"? > > You are not an android developer are you? Nope. > RIL is the user mode Radio > Interface Layer. > > > > 1) user mode initiates an "opportunistic suspend" or whatever we call > > > this these days by writing to some sys interface. > > > > Okay. > > > > > 2) a call comes in (multi-threaded cpu) just after the write. > > > > Right. Presumably we want the suspend to be delayed until the call > > can be handled by a user programm, yes? > > > > > 3) the caif driver grabs a blocker, stopping the in flight suspend in the > > > nick of time. But, when should it release it without racing? How does > > > one implement that kernel code such that its portable to non-android user > > > mode stacks? > > > > I don't know anything about the caif driver so I can't answer this > > question in detail. Here's the general idea: Somehow the kernel has to > > notify userspace that an incoming call has arrived. Whatever driver or > > subsystem sends that notification should release the suspend blocker > > once userspace indicates that the notification has been received (for > > example, by reading all the data in an input event queue). That's true > > for both Android and non-Android. > > so tell me how user space indicates to the kernel object will ever know > when its ok to release the blocker? First you tell me how the kernel indicates to userspace that an incoming call has arrived. My answer will depend on that. > then tell me how suspend blocker provide an elegant portable solution > to this that works for multiple user mode stacks. When did I -- or anyone -- ever say it was "elegant"? As for multiple user mode stacks, I don't see the issue. If you grant there is a mechanism whereby userspace indicates to the kernel that a blocker may be released, then it seems obvious that this mechanism could be used in multiple user mode stacks. > problem 1) opportunistic enable suspend deferral for critical sections > when suspending is "bad" > problem 2) race between entering pm_suspend call tree and wake events. Can you point out any examples of kernel critical sections where suspending is "bad" that aren't started by arrival of a wakeup event? Unless you can, I'm justified in saying that these "two" problems are one and the same. > > The fact that this is bound to a user-mode PM design was inevitable, > > given the whole idea was to overcome weaknesses in the system suspend > > implementation and that implementation already is driven by userspace. > > I think we can do better. Maybe we can. I'm certainly not going to claim that suspend blockers are the best possible solution. And I'm not really keen on seeing them merged along with all the attendant opportunistic suspend and userspace API stuff. But so far I haven't heard anything significantly better. > > I don't know -- I have never looked at the Android userspace. No doubt > > Arve can provide some examples. > who are you? You sound like a Vorlon. Should I ask: "What do you want?" :-) > You don't know about the Android stack, and you keep > blowing about how suspend blockers solve all the PM problems? Not all of them. Just the race between wakeup events and suspend. > I'm > starting to think your just trolling. You may think what you like. Perhaps there is a certain degree of truth to the accusation -- goodness knows, I do tend to prolong technical conversations when I think I'm right. (Ask anybody who has corresponded with me for any length of time.) But this is probably true of a lot of kernel developers... Alan Stern ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend 2010-06-22 2:46 ` Alan Stern @ 2010-06-22 9:24 ` Rafael J. Wysocki 0 siblings, 0 replies; 75+ messages in thread From: Rafael J. Wysocki @ 2010-06-22 9:24 UTC (permalink / raw) To: Alan Stern Cc: markgross, linux-pm, Matthew Garrett, Linux Kernel Mailing List, Dmitry Torokhov, Arve Hjønnevåg, Neil Brown, mark gross On Tuesday, June 22, 2010, Alan Stern wrote: > On Mon, 21 Jun 2010, mark gross wrote: > ... > > I'm starting to think your just trolling. > > You may think what you like. Perhaps there is a certain degree of > truth to the accusation -- goodness knows, I do tend to prolong > technical conversations when I think I'm right. (Ask anybody who has > corresponded with me for any length of time.) But this is probably > true of a lot of kernel developers... I certainly don't mind discussing with you. :-) Thanks, Rafael ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend 2010-06-22 1:58 ` mark gross 2010-06-22 2:46 ` Alan Stern @ 2010-06-22 6:18 ` Florian Mickler 2010-06-22 23:22 ` mark gross 2010-06-22 9:29 ` Rafael J. Wysocki 2 siblings, 1 reply; 75+ messages in thread From: Florian Mickler @ 2010-06-22 6:18 UTC (permalink / raw) To: markgross Cc: 640e9920, Alan Stern, Rafael J. Wysocki, linux-pm, Matthew Garrett, Linux Kernel Mailing List, Dmitry Torokhov, Arve Hjønnevåg, Neil Brown On Mon, 21 Jun 2010 18:58:37 -0700 mark gross <640e9920@gmail.com> wrote: > wrong. They are about 1) adding opportunistic suspend, and 2) adding > critical sections to block suspend. > > No race issues where ever talked about until alternative solutions > where put up. The whole and only reason to even define the term "critical sections" is when you need to define "a race". Or vice versa. A race is prevented by defining critical sections and protecting these against concurrent access. [..snip..] [some rant that alan is not familiar with android userspace..] Are you suggesting that only android developers are supposed to talk about this? This is a pretty basic thing. It has only to do with system suspend. (And using system suspend aggressively) > > --mgross > Cheers, Flo ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend 2010-06-22 6:18 ` Florian Mickler @ 2010-06-22 23:22 ` mark gross 0 siblings, 0 replies; 75+ messages in thread From: mark gross @ 2010-06-22 23:22 UTC (permalink / raw) To: Florian Mickler Cc: markgross, 640e9920, Alan Stern, Rafael J. Wysocki, linux-pm, Matthew Garrett, Linux Kernel Mailing List, Dmitry Torokhov, Arve Hjønnevåg, Neil Brown On Tue, Jun 22, 2010 at 08:18:03AM +0200, Florian Mickler wrote: > On Mon, 21 Jun 2010 18:58:37 -0700 > mark gross <640e9920@gmail.com> wrote: > > > > wrong. They are about 1) adding opportunistic suspend, and 2) adding > > critical sections to block suspend. > > > > No race issues where ever talked about until alternative solutions > > where put up. > > The whole and only reason to even define the term "critical sections" is > when you need to define "a race". Or vice versa. A race is prevented by > defining critical sections and protecting these against concurrent > access. > > [..snip..] > > [some rant that alan is not familiar with android userspace..] > > Are you suggesting that only android developers are supposed to talk > about this? of course not. I'm just getting frustrated with having android-isms tossed in my face as we try to discuss the merits of the ideas, only to find that the are getting tossed around by someone not familiar with Android. Sorry about that. I was having a bad day. --mgross > > This is a pretty basic thing. It has only to do with system suspend. > (And using system suspend aggressively) > > > > > --mgross > > > > Cheers, > Flo ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend 2010-06-22 1:58 ` mark gross 2010-06-22 2:46 ` Alan Stern 2010-06-22 6:18 ` Florian Mickler @ 2010-06-22 9:29 ` Rafael J. Wysocki 2 siblings, 0 replies; 75+ messages in thread From: Rafael J. Wysocki @ 2010-06-22 9:29 UTC (permalink / raw) To: markgross Cc: Alan Stern, linux-pm, Matthew Garrett, Linux Kernel Mailing List, Dmitry Torokhov, Arve Hjønnevåg, Neil Brown, mark gross On Tuesday, June 22, 2010, mark gross wrote: > On Mon, Jun 21, 2010 at 11:57:21AM -0400, Alan Stern wrote: > > On Sun, 20 Jun 2010, mark gross wrote: ... > > I don't know -- I have never looked at the Android userspace. No doubt > > Arve can provide some examples. > who are you? You don't know about the Android stack, and you keep > blowing about how suspend blockers solve all the PM problems? I'm > starting to think your just trolling. And that's just an excellent way of convincing people that you're right. Guess how it worked on me. For your information, Alan is one of the people most actively engaged in power management development at the kernel level and his ideas have heavily influenced the design of the I/O runtime PM framework, among other things. I certainly appeciate Alan's opinions very much, because he usually is technically right. Thanks, Rafael ^ permalink raw reply [flat|nested] 75+ messages in thread
end of thread, other threads:[~2010-06-25 20:58 UTC | newest] Thread overview: 75+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2010-06-19 22:05 [RFC][PATCH] PM: Avoid losing wakeup events during suspend Rafael J. Wysocki 2010-06-20 5:52 ` mark gross 2010-06-20 12:49 ` Rafael J. Wysocki 2010-06-20 23:13 ` mark gross 2010-06-20 16:28 ` Alan Stern 2010-06-20 21:50 ` Rafael J. Wysocki 2010-06-21 2:23 ` Alan Stern 2010-06-21 5:32 ` Florian Mickler 2010-06-21 15:23 ` Alan Stern 2010-06-21 20:38 ` Florian Mickler 2010-06-21 22:18 ` Alan Stern 2010-06-21 22:40 ` Rafael J. Wysocki 2010-06-21 22:48 ` Rafael J. Wysocki 2010-06-22 0:50 ` Arve Hjønnevåg 2010-06-22 10:21 ` Rafael J. Wysocki 2010-06-22 14:35 ` Alan Stern 2010-06-22 15:35 ` Rafael J. Wysocki 2010-06-22 19:55 ` Alan Stern 2010-06-22 20:58 ` Rafael J. Wysocki 2010-06-22 19:59 ` [update] " Rafael J. Wysocki 2010-06-22 20:34 ` Alan Stern 2010-06-22 21:41 ` Rafael J. Wysocki 2010-06-23 2:12 ` Alan Stern 2010-06-23 10:09 ` Rafael J. Wysocki 2010-06-23 15:21 ` Alan Stern 2010-06-23 22:17 ` Rafael J. Wysocki 2010-06-24 13:13 ` [update 2] " Rafael J. Wysocki 2010-06-24 15:06 ` Rafael J. Wysocki 2010-06-24 15:35 ` Alan Stern 2010-06-24 23:00 ` [update 3] " Rafael J. Wysocki 2010-06-25 14:42 ` Alan Stern 2010-06-25 20:33 ` Rafael J. Wysocki 2010-06-24 15:44 ` [update 2] " Alan Stern 2010-06-24 16:19 ` Rafael J. Wysocki 2010-06-24 17:09 ` Alan Stern 2010-06-24 23:06 ` Rafael J. Wysocki 2010-06-25 15:09 ` Alan Stern 2010-06-25 20:37 ` Rafael J. Wysocki 2010-06-25 20:57 ` Alan Stern 2010-06-25 6:40 ` Florian Mickler 2010-06-25 13:28 ` Rafael J. Wysocki [not found] ` <201006222159.28081.rjw__37084.1419128284$1277237903$gmane$org@sisk.pl> 2010-06-24 14:16 ` [update] " Andy Lutomirski 2010-06-24 14:45 ` Alan Stern 2010-06-24 14:48 ` Rafael J. Wysocki 2010-06-24 15:21 ` Andy Lutomirski 2010-06-22 23:00 ` mark gross 2010-06-21 16:54 ` Alan Stern 2010-06-21 20:40 ` Florian Mickler 2010-06-21 21:18 ` Rafael J. Wysocki 2010-06-21 22:27 ` Alan Stern 2010-06-21 6:13 ` mark gross 2010-06-21 12:10 ` tytso 2010-06-21 12:22 ` Alan Cox 2010-06-21 12:26 ` Florian Mickler 2010-06-21 13:42 ` tytso 2010-06-21 14:01 ` Alan Cox 2010-06-22 1:07 ` mark gross 2010-06-21 16:01 ` Alan Stern 2010-06-22 1:25 ` mark gross 2010-06-22 2:24 ` Alan Stern 2010-06-21 21:58 ` Rafael J. Wysocki 2010-06-20 22:58 ` mark gross 2010-06-21 2:33 ` Alan Stern 2010-06-21 4:04 ` [linux-pm] " David Brownell 2010-06-21 6:02 ` David Brownell 2010-06-21 15:06 ` Alan Stern 2010-06-21 5:55 ` mark gross 2010-06-21 12:39 ` Florian Mickler 2010-06-21 15:57 ` Alan Stern 2010-06-22 1:58 ` mark gross 2010-06-22 2:46 ` Alan Stern 2010-06-22 9:24 ` Rafael J. Wysocki 2010-06-22 6:18 ` Florian Mickler 2010-06-22 23:22 ` mark gross 2010-06-22 9:29 ` Rafael J. Wysocki
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).