linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v6 0/6] solve deadlock caused by memory allocation with I/O
@ 2012-11-24 12:59 Ming Lei
  2012-11-24 12:59 ` [PATCH v6 1/6] mm: teach mm by current context info to not do I/O during memory allocation Ming Lei
                   ` (6 more replies)
  0 siblings, 7 replies; 19+ messages in thread
From: Ming Lei @ 2012-11-24 12:59 UTC (permalink / raw)
  To: linux-kernel
  Cc: Alan Stern, Oliver Neukum, Minchan Kim, Greg Kroah-Hartman,
	Rafael J. Wysocki, Jens Axboe, David S. Miller, Andrew Morton,
	netdev, linux-usb, linux-pm, linux-mm

Hi,

This patchset try to solve one deadlock problem which might be caused
by memory allocation with block I/O during runtime PM and block device
error handling path. Traditionly, the problem is addressed by passing
GFP_NOIO statically to mm, but that is not a effective solution, see
detailed description in patch 1's commit log.

This patch set introduces one process flag and trys to fix the deadlock
problem on block device/network device during runtime PM or usb bus reset.

The 1st one is the change on include/sched.h and mm.

The 2nd patch introduces the flag of memalloc_noio on 'dev_pm_info',
and pm_runtime_set_memalloc_noio(), so that PM Core can teach mm to not
allocate mm with GFP_IO during the runtime_resume callback only on
device with the flag set.

The following 2 patches apply the introduced pm_runtime_set_memalloc_noio()
to mark all devices as memalloc_noio_resume in the path from the block or
network device to the root device in device tree.

The last 2 patches are applied again PM and USB subsystem to demonstrate
how to use the introduced mechanism to fix the deadlock problem.

Andrew, could you queue these patches into your tree since V6 fixes all
your concerns and looks no one objects these patches?

Change logs:
V6:
	- fix one compile failure(1/6), and only one line change

V5:
        - don't clear GFP_FS
        - coding style fix
        - add comments
        - see details in individual change logs

V4:
        - patches from the 2nd to the 6th changed
        - call pm_runtime_set_memalloc_noio() after device_add() as pointed
        by Alan
        - set PF_MEMALLOC_NOIO during runtime_suspend()

V3:
        - patch 2/6 and 5/6 changed, see their commit log
        - remove RFC from title since several guys have expressed that
        it is a reasonable solution
V2:
        - remove changes on 'may_writepage' and 'may_swap'(1/6)
        - unset GFP_IOFS in try_to_free_pages() path(1/6)
        - introduce pm_runtime_set_memalloc_noio()
        - only apply the meachnism on block/network device and its ancestors
        for runtime resume context
V1:
        - take Minchan's change to avoid the check in alloc_page hot path
        - change the helpers' style into save/restore as suggested by Alan
        - memory allocation with no io in usb bus reset path for all devices
        as suggested by Greg and Oliver

 block/genhd.c                |   10 +++++
 drivers/base/power/runtime.c |   92 +++++++++++++++++++++++++++++++++++++++++-
 drivers/usb/core/hub.c       |   13 ++++++
 include/linux/pm.h           |    1 +
 include/linux/pm_runtime.h   |    3 ++
 include/linux/sched.h        |   22 ++++++++++
 mm/page_alloc.c              |    9 ++++-
 mm/vmscan.c                  |    4 +-
 net/core/net-sysfs.c         |    5 +++
 9 files changed, 154 insertions(+), 5 deletions(-)

Thanks,
--
Ming Lei


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v6 1/6] mm: teach mm by current context info to not do I/O during memory allocation
  2012-11-24 12:59 [PATCH v6 0/6] solve deadlock caused by memory allocation with I/O Ming Lei
@ 2012-11-24 12:59 ` Ming Lei
  2012-11-24 12:59 ` [PATCH v6 2/6] PM / Runtime: introduce pm_runtime_set_memalloc_noio() Ming Lei
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 19+ messages in thread
From: Ming Lei @ 2012-11-24 12:59 UTC (permalink / raw)
  To: linux-kernel
  Cc: Alan Stern, Oliver Neukum, Minchan Kim, Greg Kroah-Hartman,
	Rafael J. Wysocki, Jens Axboe, David S. Miller, Andrew Morton,
	netdev, linux-usb, linux-pm, linux-mm, Ming Lei, Jiri Kosina,
	Mel Gorman, KAMEZAWA Hiroyuki, Michal Hocko, Ingo Molnar,
	Peter Zijlstra

This patch introduces PF_MEMALLOC_NOIO on process flag('flags' field of
'struct task_struct'), so that the flag can be set by one task
to avoid doing I/O inside memory allocation in the task's context.

The patch trys to solve one deadlock problem caused by block device,
and the problem may happen at least in the below situations:

- during block device runtime resume, if memory allocation with
GFP_KERNEL is called inside runtime resume callback of any one
of its ancestors(or the block device itself), the deadlock may be
triggered inside the memory allocation since it might not complete
until the block device becomes active and the involed page I/O finishes.
The situation is pointed out first by Alan Stern. It is not a good
approach to convert all GFP_KERNEL[1] in the path into GFP_NOIO because
several subsystems may be involved(for example, PCI, USB and SCSI may
be involved for usb mass stoarage device, network devices involved too
in the iSCSI case)

- during block device runtime suspend, because runtime resume need
to wait for completion of concurrent runtime suspend.

- during error handling of usb mass storage deivce, USB bus reset
will be put on the device, so there shouldn't have any
memory allocation with GFP_KERNEL during USB bus reset, otherwise
the deadlock similar with above may be triggered. Unfortunately, any
usb device may include one mass storage interface in theory, so it
requires all usb interface drivers to handle the situation. In fact,
most usb drivers don't know how to handle bus reset on the device
and don't provide .pre_set() and .post_reset() callback at all, so
USB core has to unbind and bind driver for these devices. So it
is still not practical to resort to GFP_NOIO for solving the problem.

Also the introduced solution can be used by block subsystem or block
drivers too, for example, set the PF_MEMALLOC_NOIO flag before doing
actual I/O transfer.

It is not a good idea to convert all these GFP_KERNEL in the
affected path into GFP_NOIO because these functions doing that may be
implemented as library and will be called in many other contexts.

In fact, memalloc_noio_flags() can convert some of current static GFP_NOIO
allocation into GFP_KERNEL back in other non-affected contexts, at least
almost all GFP_NOIO in USB subsystem can be converted into GFP_KERNEL
after applying the approach and make allocation with GFP_NOIO
only happen in runtime resume/bus reset/block I/O transfer contexts
generally.

[1], several GFP_KERNEL allocation examples in runtime resume path

- pci subsystem
acpi_os_allocate
	<-acpi_ut_allocate
		<-ACPI_ALLOCATE_ZEROED
			<-acpi_evaluate_object
				<-__acpi_bus_set_power
					<-acpi_bus_set_power
						<-acpi_pci_set_power_state
							<-platform_pci_set_power_state
								<-pci_platform_power_transition
									<-__pci_complete_power_transition
										<-pci_set_power_state
											<-pci_restore_standard_config
												<-pci_pm_runtime_resume
- usb subsystem
usb_get_status
	<-finish_port_resume
		<-usb_port_resume
			<-generic_resume
				<-usb_resume_device
					<-usb_resume_both
						<-usb_runtime_resume

- some individual usb drivers
usblp, uvc, gspca, most of dvb-usb-v2 media drivers, cpia2, az6007, ....

That is just what I have found.  Unfortunately, this allocation can
only be found by human being now, and there should be many not found
since any function in the resume path(call tree) may allocate memory
with GFP_KERNEL.

Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Oliver Neukum <oneukum@suse.de>
Cc: Jiri Kosina <jiri.kosina@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
---
v6:
	- replace GFP_IO with __GFP_IO to fix compile failure

v5:
	- use inline instead of macro to define memalloc_noio_*
	- replace memalloc_noio() with memalloc_noio_flags() to
	make code neater
	- don't clear GFP_FS because no GFP_IO means
	that allocation won't enter device driver as pointed by
	Andrew Morton

v4:
	- fix comment
v3:
	- no change
v2:
        - remove changes on 'may_writepage' and 'may_swap' because that
          isn't related with the patchset, and can't introduce I/O in
          allocation path if GFP_IOFS is unset, so handing 'may_swap'
          and may_writepage on GFP_NOIO or GFP_NOFS  should be a
          mm internal thing, and let mm guys deal with that, :-).

          Looks clearing the two may_XXX flag only excludes dirty pages
	  	  and anon pages for relaiming, and the behaviour should be decided
          by GFP FLAG, IMO.

        - unset GFP_IOFS in try_to_free_pages() path since
          alloc_page_buffers()
          and dma_alloc_from_contiguous may drop into the path, as
          pointed by KAMEZAWA Hiroyuki
v1:
        - take Minchan's change to avoid the check in alloc_page hot
          path

        - change the helpers' style into save/restore as suggested by
          Alan Stern
---
 include/linux/sched.h |   22 ++++++++++++++++++++++
 mm/page_alloc.c       |    9 ++++++++-
 mm/vmscan.c           |    4 ++--
 3 files changed, 32 insertions(+), 3 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 85d14e1..c9bb377 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -51,6 +51,7 @@ struct sched_param {
 #include <linux/cred.h>
 #include <linux/llist.h>
 #include <linux/uidgid.h>
+#include <linux/gfp.h>
 
 #include <asm/processor.h>
 
@@ -1810,6 +1811,7 @@ extern void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t *
 #define PF_FROZEN	0x00010000	/* frozen for system suspend */
 #define PF_FSTRANS	0x00020000	/* inside a filesystem transaction */
 #define PF_KSWAPD	0x00040000	/* I am kswapd */
+#define PF_MEMALLOC_NOIO 0x00080000	/* Allocating memory without IO involved */
 #define PF_LESS_THROTTLE 0x00100000	/* Throttle me less: I clean memory */
 #define PF_KTHREAD	0x00200000	/* I am a kernel thread */
 #define PF_RANDOMIZE	0x00400000	/* randomize virtual address space */
@@ -1847,6 +1849,26 @@ extern void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t *
 #define tsk_used_math(p) ((p)->flags & PF_USED_MATH)
 #define used_math() tsk_used_math(current)
 
+/* __GFP_IO isn't allowed if PF_MEMALLOC_NOIO is set in current->flags */
+static inline gfp_t memalloc_noio_flags(gfp_t flags)
+{
+	if (unlikely(current->flags & PF_MEMALLOC_NOIO))
+		flags &= ~__GFP_IO;
+	return flags;
+}
+
+static inline gfp_t memalloc_noio_save(void)
+{
+	gfp_t flags = current->flags & PF_MEMALLOC_NOIO;
+	current->flags |= PF_MEMALLOC_NOIO;
+	return flags;
+}
+
+static inline void memalloc_noio_restore(gfp_t flags)
+{
+	current->flags = (current->flags & ~PF_MEMALLOC_NOIO) | flags;
+}
+
 /*
  * task->jobctl flags
  */
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 4d077f0..4e485b1 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2644,10 +2644,17 @@ retry_cpuset:
 	page = get_page_from_freelist(gfp_mask|__GFP_HARDWALL, nodemask, order,
 			zonelist, high_zoneidx, alloc_flags,
 			preferred_zone, migratetype);
-	if (unlikely(!page))
+	if (unlikely(!page)) {
+		/*
+		 * Runtime PM, block IO and its error handling path
+		 * can deadlock because I/O on the device might not
+		 * complete.
+		 */
+		gfp_mask = memalloc_noio_flags(gfp_mask);
 		page = __alloc_pages_slowpath(gfp_mask, order,
 				zonelist, high_zoneidx, nodemask,
 				preferred_zone, migratetype);
+	}
 
 	trace_mm_page_alloc(page, order, gfp_mask, migratetype);
 
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 9ca84e2..b1296c4 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2270,7 +2270,7 @@ unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
 {
 	unsigned long nr_reclaimed;
 	struct scan_control sc = {
-		.gfp_mask = gfp_mask,
+		.gfp_mask = (gfp_mask = memalloc_noio_flags(gfp_mask)),
 		.may_writepage = !laptop_mode,
 		.nr_to_reclaim = SWAP_CLUSTER_MAX,
 		.may_unmap = 1,
@@ -3277,7 +3277,7 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
 		.may_swap = 1,
 		.nr_to_reclaim = max_t(unsigned long, nr_pages,
 				       SWAP_CLUSTER_MAX),
-		.gfp_mask = gfp_mask,
+		.gfp_mask = (gfp_mask = memalloc_noio_flags(gfp_mask)),
 		.order = order,
 		.priority = ZONE_RECLAIM_PRIORITY,
 	};
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v6 2/6] PM / Runtime: introduce pm_runtime_set_memalloc_noio()
  2012-11-24 12:59 [PATCH v6 0/6] solve deadlock caused by memory allocation with I/O Ming Lei
  2012-11-24 12:59 ` [PATCH v6 1/6] mm: teach mm by current context info to not do I/O during memory allocation Ming Lei
@ 2012-11-24 12:59 ` Ming Lei
  2012-11-27 21:19   ` Rafael J. Wysocki
  2012-11-24 12:59 ` [PATCH v6 3/6] block/genhd.c: apply pm_runtime_set_memalloc_noio on block devices Ming Lei
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 19+ messages in thread
From: Ming Lei @ 2012-11-24 12:59 UTC (permalink / raw)
  To: linux-kernel
  Cc: Alan Stern, Oliver Neukum, Minchan Kim, Greg Kroah-Hartman,
	Rafael J. Wysocki, Jens Axboe, David S. Miller, Andrew Morton,
	netdev, linux-usb, linux-pm, linux-mm, Ming Lei

The patch introduces the flag of memalloc_noio in 'struct dev_pm_info'
to help PM core to teach mm not allocating memory with GFP_KERNEL
flag for avoiding probable deadlock.

As explained in the comment, any GFP_KERNEL allocation inside
runtime_resume() or runtime_suspend() on any one of device in
the path from one block or network device to the root device
in the device tree may cause deadlock, the introduced
pm_runtime_set_memalloc_noio() sets or clears the flag on
device in the path recursively.

Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
---
v5:
	- fix code style error
	- add comment on clear the device memalloc_noio flag
v4:
	- rename memalloc_noio_resume as memalloc_noio
	- remove pm_runtime_get_memalloc_noio()
	- add comments on pm_runtime_set_memalloc_noio
v3:
	- introduce pm_runtime_get_memalloc_noio()
	- hold one global lock on pm_runtime_set_memalloc_noio
	- hold device power lock when accessing memalloc_noio_resume
	  flag suggested by Alan Stern
	- implement pm_runtime_set_memalloc_noio without recursion
	  suggested by Alan Stern
v2:
	- introduce pm_runtime_set_memalloc_noio()
---
 drivers/base/power/runtime.c |   60 ++++++++++++++++++++++++++++++++++++++++++
 include/linux/pm.h           |    1 +
 include/linux/pm_runtime.h   |    3 +++
 3 files changed, 64 insertions(+)

diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
index 3148b10..3e198a0 100644
--- a/drivers/base/power/runtime.c
+++ b/drivers/base/power/runtime.c
@@ -124,6 +124,66 @@ unsigned long pm_runtime_autosuspend_expiration(struct device *dev)
 }
 EXPORT_SYMBOL_GPL(pm_runtime_autosuspend_expiration);
 
+static int dev_memalloc_noio(struct device *dev, void *data)
+{
+	return dev->power.memalloc_noio;
+}
+
+/*
+ * pm_runtime_set_memalloc_noio - Set a device's memalloc_noio flag.
+ * @dev: Device to handle.
+ * @enable: True for setting the flag and False for clearing the flag.
+ *
+ * Set the flag for all devices in the path from the device to the
+ * root device in the device tree if @enable is true, otherwise clear
+ * the flag for devices in the path whose siblings don't set the flag.
+ *
+ * The function should only be called by block device, or network
+ * device driver for solving the deadlock problem during runtime
+ * resume/suspend:
+ *
+ *     If memory allocation with GFP_KERNEL is called inside runtime
+ *     resume/suspend callback of any one of its ancestors(or the
+ *     block device itself), the deadlock may be triggered inside the
+ *     memory allocation since it might not complete until the block
+ *     device becomes active and the involed page I/O finishes. The
+ *     situation is pointed out first by Alan Stern. Network device
+ *     are involved in iSCSI kind of situation.
+ *
+ * The lock of dev_hotplug_mutex is held in the function for handling
+ * hotplug race because pm_runtime_set_memalloc_noio() may be called
+ * in async probe().
+ *
+ * The function should be called between device_add() and device_del()
+ * on the affected device(block/network device).
+ */
+void pm_runtime_set_memalloc_noio(struct device *dev, bool enable)
+{
+	static DEFINE_MUTEX(dev_hotplug_mutex);
+
+	mutex_lock(&dev_hotplug_mutex);
+	for (;;) {
+		/* hold power lock since bitfield is not SMP-safe. */
+		spin_lock_irq(&dev->power.lock);
+		dev->power.memalloc_noio = enable;
+		spin_unlock_irq(&dev->power.lock);
+
+		dev = dev->parent;
+
+		/*
+		 * clear flag of the parent device only if all the
+		 * children don't set the flag because ancestor's
+		 * flag was set by any one of the descendants.
+		 */
+		if (!dev || (!enable &&
+			     device_for_each_child(dev, NULL,
+						   dev_memalloc_noio)))
+			break;
+	}
+	mutex_unlock(&dev_hotplug_mutex);
+}
+EXPORT_SYMBOL_GPL(pm_runtime_set_memalloc_noio);
+
 /**
  * rpm_check_suspend_allowed - Test whether a device may be suspended.
  * @dev: Device to test.
diff --git a/include/linux/pm.h b/include/linux/pm.h
index 03d7bb1..1a8a69d 100644
--- a/include/linux/pm.h
+++ b/include/linux/pm.h
@@ -538,6 +538,7 @@ struct dev_pm_info {
 	unsigned int		irq_safe:1;
 	unsigned int		use_autosuspend:1;
 	unsigned int		timer_autosuspends:1;
+	unsigned int		memalloc_noio:1;
 	enum rpm_request	request;
 	enum rpm_status		runtime_status;
 	int			runtime_error;
diff --git a/include/linux/pm_runtime.h b/include/linux/pm_runtime.h
index f271860..775e063 100644
--- a/include/linux/pm_runtime.h
+++ b/include/linux/pm_runtime.h
@@ -47,6 +47,7 @@ extern void pm_runtime_set_autosuspend_delay(struct device *dev, int delay);
 extern unsigned long pm_runtime_autosuspend_expiration(struct device *dev);
 extern void pm_runtime_update_max_time_suspended(struct device *dev,
 						 s64 delta_ns);
+extern void pm_runtime_set_memalloc_noio(struct device *dev, bool enable);
 
 static inline bool pm_children_suspended(struct device *dev)
 {
@@ -149,6 +150,8 @@ static inline void pm_runtime_set_autosuspend_delay(struct device *dev,
 						int delay) {}
 static inline unsigned long pm_runtime_autosuspend_expiration(
 				struct device *dev) { return 0; }
+static inline void pm_runtime_set_memalloc_noio(struct device *dev,
+						bool enable){}
 
 #endif /* !CONFIG_PM_RUNTIME */
 
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v6 3/6] block/genhd.c: apply pm_runtime_set_memalloc_noio on block devices
  2012-11-24 12:59 [PATCH v6 0/6] solve deadlock caused by memory allocation with I/O Ming Lei
  2012-11-24 12:59 ` [PATCH v6 1/6] mm: teach mm by current context info to not do I/O during memory allocation Ming Lei
  2012-11-24 12:59 ` [PATCH v6 2/6] PM / Runtime: introduce pm_runtime_set_memalloc_noio() Ming Lei
@ 2012-11-24 12:59 ` Ming Lei
  2012-11-24 12:59 ` [PATCH v6 4/6] net/core: apply pm_runtime_set_memalloc_noio on network devices Ming Lei
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 19+ messages in thread
From: Ming Lei @ 2012-11-24 12:59 UTC (permalink / raw)
  To: linux-kernel
  Cc: Alan Stern, Oliver Neukum, Minchan Kim, Greg Kroah-Hartman,
	Rafael J. Wysocki, Jens Axboe, David S. Miller, Andrew Morton,
	netdev, linux-usb, linux-pm, linux-mm, Ming Lei

This patch applyes the introduced pm_runtime_set_memalloc_noio on
block device so that PM core will teach mm to not allocate memory with
GFP_IOFS when calling the runtime_resume and runtime_suspend callback
for block devices and its ancestors.

Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
---
v5:
	- fix code style and one typo
v4:
	- call pm_runtime_set_memalloc_noio(ddev, true) after device_add
---
 block/genhd.c |   10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/block/genhd.c b/block/genhd.c
index 9a289d7..1905966 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -18,6 +18,7 @@
 #include <linux/mutex.h>
 #include <linux/idr.h>
 #include <linux/log2.h>
+#include <linux/pm_runtime.h>
 
 #include "blk.h"
 
@@ -532,6 +533,14 @@ static void register_disk(struct gendisk *disk)
 			return;
 		}
 	}
+
+	/*
+	 * avoid probable deadlock caused by allocating memory with
+	 * GFP_KERNEL in runtime_resume callback of its all ancestor
+	 * devices
+	 */
+	pm_runtime_set_memalloc_noio(ddev, true);
+
 	disk->part0.holder_dir = kobject_create_and_add("holders", &ddev->kobj);
 	disk->slave_dir = kobject_create_and_add("slaves", &ddev->kobj);
 
@@ -661,6 +670,7 @@ void del_gendisk(struct gendisk *disk)
 	disk->driverfs_dev = NULL;
 	if (!sysfs_deprecated)
 		sysfs_remove_link(block_depr, dev_name(disk_to_dev(disk)));
+	pm_runtime_set_memalloc_noio(disk_to_dev(disk), false);
 	device_del(disk_to_dev(disk));
 }
 EXPORT_SYMBOL(del_gendisk);
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v6 4/6] net/core: apply pm_runtime_set_memalloc_noio on network devices
  2012-11-24 12:59 [PATCH v6 0/6] solve deadlock caused by memory allocation with I/O Ming Lei
                   ` (2 preceding siblings ...)
  2012-11-24 12:59 ` [PATCH v6 3/6] block/genhd.c: apply pm_runtime_set_memalloc_noio on block devices Ming Lei
@ 2012-11-24 12:59 ` Ming Lei
  2012-11-24 12:59 ` [PATCH v6 5/6] PM / Runtime: force memory allocation with no I/O during Runtime PM callbcack Ming Lei
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 19+ messages in thread
From: Ming Lei @ 2012-11-24 12:59 UTC (permalink / raw)
  To: linux-kernel
  Cc: Alan Stern, Oliver Neukum, Minchan Kim, Greg Kroah-Hartman,
	Rafael J. Wysocki, Jens Axboe, David S. Miller, Andrew Morton,
	netdev, linux-usb, linux-pm, linux-mm, Ming Lei, Eric Dumazet,
	David Decotigny, Tom Herbert, Ingo Molnar

Deadlock might be caused by allocating memory with GFP_KERNEL in
runtime_resume and runtime_suspend callback of network devices in
iSCSI situation, so mark network devices and its ancestor as
'memalloc_noio' with the introduced pm_runtime_set_memalloc_noio().

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David Decotigny <david.decotigny@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
---
v4:
	 - call pm_runtime_set_memalloc_noio(ddev, true) after
	   device_add
---
 net/core/net-sysfs.c |    5 +++++
 1 file changed, 5 insertions(+)

diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index bcf02f6..a55d255 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -22,6 +22,7 @@
 #include <linux/vmalloc.h>
 #include <linux/export.h>
 #include <linux/jiffies.h>
+#include <linux/pm_runtime.h>
 #include <net/wext.h>
 
 #include "net-sysfs.h"
@@ -1386,6 +1387,8 @@ void netdev_unregister_kobject(struct net_device * net)
 
 	remove_queue_kobjects(net);
 
+	pm_runtime_set_memalloc_noio(dev, false);
+
 	device_del(dev);
 }
 
@@ -1421,6 +1424,8 @@ int netdev_register_kobject(struct net_device *net)
 		return error;
 	}
 
+	pm_runtime_set_memalloc_noio(dev, true);
+
 	return error;
 }
 
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v6 5/6] PM / Runtime: force memory allocation with no I/O during Runtime PM callbcack
  2012-11-24 12:59 [PATCH v6 0/6] solve deadlock caused by memory allocation with I/O Ming Lei
                   ` (3 preceding siblings ...)
  2012-11-24 12:59 ` [PATCH v6 4/6] net/core: apply pm_runtime_set_memalloc_noio on network devices Ming Lei
@ 2012-11-24 12:59 ` Ming Lei
  2012-11-27 21:24   ` Rafael J. Wysocki
  2012-11-24 12:59 ` [PATCH v6 6/6] USB: forbid memory allocation with I/O during bus reset Ming Lei
  2012-11-27 20:25 ` [PATCH v6 0/6] solve deadlock caused by memory allocation with I/O Andrew Morton
  6 siblings, 1 reply; 19+ messages in thread
From: Ming Lei @ 2012-11-24 12:59 UTC (permalink / raw)
  To: linux-kernel
  Cc: Alan Stern, Oliver Neukum, Minchan Kim, Greg Kroah-Hartman,
	Rafael J. Wysocki, Jens Axboe, David S. Miller, Andrew Morton,
	netdev, linux-usb, linux-pm, linux-mm, Ming Lei

This patch applies the introduced memalloc_noio_save() and
memalloc_noio_restore() to force memory allocation with no I/O
during runtime_resume/runtime_suspend callback on device with
the flag of 'memalloc_noio' set.

Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Oliver Neukum <oneukum@suse.de>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
---
v5:
	- use inline memalloc_noio_save()
v4:
	- runtime_suspend need this too because rpm_resume may wait for
	completion of concurrent runtime_suspend, so deadlock still may
	be triggered in runtime_suspend path.
---
 drivers/base/power/runtime.c |   32 ++++++++++++++++++++++++++++++--
 1 file changed, 30 insertions(+), 2 deletions(-)

diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
index 3e198a0..96d99ea 100644
--- a/drivers/base/power/runtime.c
+++ b/drivers/base/power/runtime.c
@@ -371,6 +371,7 @@ static int rpm_suspend(struct device *dev, int rpmflags)
 	int (*callback)(struct device *);
 	struct device *parent = NULL;
 	int retval;
+	unsigned int noio_flag;
 
 	trace_rpm_suspend(dev, rpmflags);
 
@@ -480,7 +481,20 @@ static int rpm_suspend(struct device *dev, int rpmflags)
 	if (!callback && dev->driver && dev->driver->pm)
 		callback = dev->driver->pm->runtime_suspend;
 
-	retval = rpm_callback(callback, dev);
+	/*
+	 * Deadlock might be caused if memory allocation with GFP_KERNEL
+	 * happens inside runtime_suspend callback of one block device's
+	 * ancestor or the block device itself. Network device might be
+	 * thought as part of iSCSI block device, so network device and
+	 * its ancestor should be marked as memalloc_noio.
+	 */
+	if (dev->power.memalloc_noio) {
+		noio_flag = memalloc_noio_save();
+		retval = rpm_callback(callback, dev);
+		memalloc_noio_restore(noio_flag);
+	} else {
+		retval = rpm_callback(callback, dev);
+	}
 	if (retval)
 		goto fail;
 
@@ -563,6 +577,7 @@ static int rpm_resume(struct device *dev, int rpmflags)
 	int (*callback)(struct device *);
 	struct device *parent = NULL;
 	int retval = 0;
+	unsigned int noio_flag;
 
 	trace_rpm_resume(dev, rpmflags);
 
@@ -712,7 +727,20 @@ static int rpm_resume(struct device *dev, int rpmflags)
 	if (!callback && dev->driver && dev->driver->pm)
 		callback = dev->driver->pm->runtime_resume;
 
-	retval = rpm_callback(callback, dev);
+	/*
+	 * Deadlock might be caused if memory allocation with GFP_KERNEL
+	 * happens inside runtime_resume callback of one block device's
+	 * ancestor or the block device itself. Network device might be
+	 * thought as part of iSCSI block device, so network device and
+	 * its ancestor should be marked as memalloc_noio.
+	 */
+	if (dev->power.memalloc_noio) {
+		noio_flag = memalloc_noio_save();
+		retval = rpm_callback(callback, dev);
+		memalloc_noio_restore(noio_flag);
+	} else {
+		retval = rpm_callback(callback, dev);
+	}
 	if (retval) {
 		__update_runtime_status(dev, RPM_SUSPENDED);
 		pm_runtime_cancel_pending(dev);
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v6 6/6] USB: forbid memory allocation with I/O during bus reset
  2012-11-24 12:59 [PATCH v6 0/6] solve deadlock caused by memory allocation with I/O Ming Lei
                   ` (4 preceding siblings ...)
  2012-11-24 12:59 ` [PATCH v6 5/6] PM / Runtime: force memory allocation with no I/O during Runtime PM callbcack Ming Lei
@ 2012-11-24 12:59 ` Ming Lei
  2012-11-27 20:25 ` [PATCH v6 0/6] solve deadlock caused by memory allocation with I/O Andrew Morton
  6 siblings, 0 replies; 19+ messages in thread
From: Ming Lei @ 2012-11-24 12:59 UTC (permalink / raw)
  To: linux-kernel
  Cc: Alan Stern, Oliver Neukum, Minchan Kim, Greg Kroah-Hartman,
	Rafael J. Wysocki, Jens Axboe, David S. Miller, Andrew Morton,
	netdev, linux-usb, linux-pm, linux-mm, Ming Lei

If one storage interface or usb network interface(iSCSI case)
exists in current configuration, memory allocation with
GFP_KERNEL during usb_device_reset() might trigger I/O transfer
on the storage interface itself and cause deadlock because
the 'us->dev_mutex' is held in .pre_reset() and the storage
interface can't do I/O transfer when the reset is triggered
by other interface, or the error handling can't be completed
if the reset is triggered by the storage itself(error handling path).
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Oliver Neukum <oneukum@suse.de>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
---
v5:
        - use inline memalloc_noio_save()
v4:
	- mark current memalloc_noio for every usb device reset
---
 drivers/usb/core/hub.c |   13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
index 90accde..2d5cc1c 100644
--- a/drivers/usb/core/hub.c
+++ b/drivers/usb/core/hub.c
@@ -5040,6 +5040,7 @@ int usb_reset_device(struct usb_device *udev)
 {
 	int ret;
 	int i;
+	unsigned int noio_flag;
 	struct usb_host_config *config = udev->actconfig;
 
 	if (udev->state == USB_STATE_NOTATTACHED ||
@@ -5049,6 +5050,17 @@ int usb_reset_device(struct usb_device *udev)
 		return -EINVAL;
 	}
 
+	/*
+	 * Don't allocate memory with GFP_KERNEL in current
+	 * context to avoid possible deadlock if usb mass
+	 * storage interface or usbnet interface(iSCSI case)
+	 * is included in current configuration. The easist
+	 * approach is to do it for every device reset,
+	 * because the device 'memalloc_noio' flag may have
+	 * not been set before reseting the usb device.
+	 */
+	noio_flag = memalloc_noio_save();
+
 	/* Prevent autosuspend during the reset */
 	usb_autoresume_device(udev);
 
@@ -5093,6 +5105,7 @@ int usb_reset_device(struct usb_device *udev)
 	}
 
 	usb_autosuspend_device(udev);
+	memalloc_noio_restore(noio_flag);
 	return ret;
 }
 EXPORT_SYMBOL_GPL(usb_reset_device);
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v6 0/6] solve deadlock caused by memory allocation with I/O
  2012-11-24 12:59 [PATCH v6 0/6] solve deadlock caused by memory allocation with I/O Ming Lei
                   ` (5 preceding siblings ...)
  2012-11-24 12:59 ` [PATCH v6 6/6] USB: forbid memory allocation with I/O during bus reset Ming Lei
@ 2012-11-27 20:25 ` Andrew Morton
  6 siblings, 0 replies; 19+ messages in thread
From: Andrew Morton @ 2012-11-27 20:25 UTC (permalink / raw)
  To: Ming Lei
  Cc: linux-kernel, Alan Stern, Oliver Neukum, Minchan Kim,
	Greg Kroah-Hartman, Rafael J. Wysocki, Jens Axboe,
	David S. Miller, netdev, linux-usb, linux-pm, linux-mm

On Sat, 24 Nov 2012 20:59:12 +0800
Ming Lei <ming.lei@canonical.com> wrote:

> This patchset try to solve one deadlock problem which might be caused
> by memory allocation with block I/O during runtime PM and block device
> error handling path. Traditionly, the problem is addressed by passing
> GFP_NOIO statically to mm, but that is not a effective solution, see
> detailed description in patch 1's commit log.
> 
> This patch set introduces one process flag and trys to fix the deadlock
> problem on block device/network device during runtime PM or usb bus reset.
> 
> The 1st one is the change on include/sched.h and mm.
> 
> The 2nd patch introduces the flag of memalloc_noio on 'dev_pm_info',
> and pm_runtime_set_memalloc_noio(), so that PM Core can teach mm to not
> allocate mm with GFP_IO during the runtime_resume callback only on
> device with the flag set.
> 
> The following 2 patches apply the introduced pm_runtime_set_memalloc_noio()
> to mark all devices as memalloc_noio_resume in the path from the block or
> network device to the root device in device tree.
> 
> The last 2 patches are applied again PM and USB subsystem to demonstrate
> how to use the introduced mechanism to fix the deadlock problem.
> 
> Andrew, could you queue these patches into your tree since V6 fixes all
> your concerns and looks no one objects these patches?

Yes, this patchset looks ready to run with.  But as we're at -rc7 I'll ask
you to refresh, retest and resend after 3.8-rc1, please.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v6 2/6] PM / Runtime: introduce pm_runtime_set_memalloc_noio()
  2012-11-24 12:59 ` [PATCH v6 2/6] PM / Runtime: introduce pm_runtime_set_memalloc_noio() Ming Lei
@ 2012-11-27 21:19   ` Rafael J. Wysocki
  2012-11-27 21:46     ` Rafael J. Wysocki
                       ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Rafael J. Wysocki @ 2012-11-27 21:19 UTC (permalink / raw)
  To: linux-pm
  Cc: Ming Lei, linux-kernel, Alan Stern, Oliver Neukum, Minchan Kim,
	Greg Kroah-Hartman, Jens Axboe, David S. Miller, Andrew Morton,
	netdev, linux-usb, linux-mm

On Saturday, November 24, 2012 08:59:14 PM Ming Lei wrote:
> The patch introduces the flag of memalloc_noio in 'struct dev_pm_info'
> to help PM core to teach mm not allocating memory with GFP_KERNEL
> flag for avoiding probable deadlock.
> 
> As explained in the comment, any GFP_KERNEL allocation inside
> runtime_resume() or runtime_suspend() on any one of device in
> the path from one block or network device to the root device
> in the device tree may cause deadlock, the introduced
> pm_runtime_set_memalloc_noio() sets or clears the flag on
> device in the path recursively.
> 
> Cc: Alan Stern <stern@rowland.harvard.edu>
> Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
> Signed-off-by: Ming Lei <ming.lei@canonical.com>
> ---
> v5:
> 	- fix code style error
> 	- add comment on clear the device memalloc_noio flag
> v4:
> 	- rename memalloc_noio_resume as memalloc_noio
> 	- remove pm_runtime_get_memalloc_noio()
> 	- add comments on pm_runtime_set_memalloc_noio
> v3:
> 	- introduce pm_runtime_get_memalloc_noio()
> 	- hold one global lock on pm_runtime_set_memalloc_noio
> 	- hold device power lock when accessing memalloc_noio_resume
> 	  flag suggested by Alan Stern
> 	- implement pm_runtime_set_memalloc_noio without recursion
> 	  suggested by Alan Stern
> v2:
> 	- introduce pm_runtime_set_memalloc_noio()
> ---
>  drivers/base/power/runtime.c |   60 ++++++++++++++++++++++++++++++++++++++++++
>  include/linux/pm.h           |    1 +
>  include/linux/pm_runtime.h   |    3 +++
>  3 files changed, 64 insertions(+)
> 
> diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
> index 3148b10..3e198a0 100644
> --- a/drivers/base/power/runtime.c
> +++ b/drivers/base/power/runtime.c
> @@ -124,6 +124,66 @@ unsigned long pm_runtime_autosuspend_expiration(struct device *dev)
>  }
>  EXPORT_SYMBOL_GPL(pm_runtime_autosuspend_expiration);
>  
> +static int dev_memalloc_noio(struct device *dev, void *data)
> +{
> +	return dev->power.memalloc_noio;
> +}
> +
> +/*
> + * pm_runtime_set_memalloc_noio - Set a device's memalloc_noio flag.
> + * @dev: Device to handle.
> + * @enable: True for setting the flag and False for clearing the flag.
> + *
> + * Set the flag for all devices in the path from the device to the
> + * root device in the device tree if @enable is true, otherwise clear
> + * the flag for devices in the path whose siblings don't set the flag.
> + *

Please use counters instead of walking the whole path every time.  Ie. in
addition to the flag add a counter to store the number of the device's
children having that flag set.

Besides, don't you need to check children for the arg device itself?

> + * The function should only be called by block device, or network
> + * device driver for solving the deadlock problem during runtime
> + * resume/suspend:
> + *
> + *     If memory allocation with GFP_KERNEL is called inside runtime
> + *     resume/suspend callback of any one of its ancestors(or the
> + *     block device itself), the deadlock may be triggered inside the
> + *     memory allocation since it might not complete until the block
> + *     device becomes active and the involed page I/O finishes. The
> + *     situation is pointed out first by Alan Stern. Network device
> + *     are involved in iSCSI kind of situation.
> + *
> + * The lock of dev_hotplug_mutex is held in the function for handling
> + * hotplug race because pm_runtime_set_memalloc_noio() may be called
> + * in async probe().
> + *
> + * The function should be called between device_add() and device_del()
> + * on the affected device(block/network device).
> + */
> +void pm_runtime_set_memalloc_noio(struct device *dev, bool enable)
> +{
> +	static DEFINE_MUTEX(dev_hotplug_mutex);

What's the mutex for?

> +
> +	mutex_lock(&dev_hotplug_mutex);
> +	for (;;) {
> +		/* hold power lock since bitfield is not SMP-safe. */
> +		spin_lock_irq(&dev->power.lock);
> +		dev->power.memalloc_noio = enable;
> +		spin_unlock_irq(&dev->power.lock);
> +
> +		dev = dev->parent;
> +
> +		/*
> +		 * clear flag of the parent device only if all the
> +		 * children don't set the flag because ancestor's
> +		 * flag was set by any one of the descendants.
> +		 */
> +		if (!dev || (!enable &&
> +			     device_for_each_child(dev, NULL,
> +						   dev_memalloc_noio)))
> +			break;
> +	}
> +	mutex_unlock(&dev_hotplug_mutex);
> +}
> +EXPORT_SYMBOL_GPL(pm_runtime_set_memalloc_noio);
> +
>  /**
>   * rpm_check_suspend_allowed - Test whether a device may be suspended.
>   * @dev: Device to test.
> diff --git a/include/linux/pm.h b/include/linux/pm.h
> index 03d7bb1..1a8a69d 100644
> --- a/include/linux/pm.h
> +++ b/include/linux/pm.h
> @@ -538,6 +538,7 @@ struct dev_pm_info {
>  	unsigned int		irq_safe:1;
>  	unsigned int		use_autosuspend:1;
>  	unsigned int		timer_autosuspends:1;
> +	unsigned int		memalloc_noio:1;
>  	enum rpm_request	request;
>  	enum rpm_status		runtime_status;
>  	int			runtime_error;
> diff --git a/include/linux/pm_runtime.h b/include/linux/pm_runtime.h
> index f271860..775e063 100644
> --- a/include/linux/pm_runtime.h
> +++ b/include/linux/pm_runtime.h
> @@ -47,6 +47,7 @@ extern void pm_runtime_set_autosuspend_delay(struct device *dev, int delay);
>  extern unsigned long pm_runtime_autosuspend_expiration(struct device *dev);
>  extern void pm_runtime_update_max_time_suspended(struct device *dev,
>  						 s64 delta_ns);
> +extern void pm_runtime_set_memalloc_noio(struct device *dev, bool enable);
>  
>  static inline bool pm_children_suspended(struct device *dev)
>  {
> @@ -149,6 +150,8 @@ static inline void pm_runtime_set_autosuspend_delay(struct device *dev,
>  						int delay) {}
>  static inline unsigned long pm_runtime_autosuspend_expiration(
>  				struct device *dev) { return 0; }
> +static inline void pm_runtime_set_memalloc_noio(struct device *dev,
> +						bool enable){}
>  
>  #endif /* !CONFIG_PM_RUNTIME */
>  
> 
-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v6 5/6] PM / Runtime: force memory allocation with no I/O during Runtime PM callbcack
  2012-11-24 12:59 ` [PATCH v6 5/6] PM / Runtime: force memory allocation with no I/O during Runtime PM callbcack Ming Lei
@ 2012-11-27 21:24   ` Rafael J. Wysocki
  2012-11-28  3:06     ` Ming Lei
  0 siblings, 1 reply; 19+ messages in thread
From: Rafael J. Wysocki @ 2012-11-27 21:24 UTC (permalink / raw)
  To: linux-pm
  Cc: Ming Lei, linux-kernel, Alan Stern, Oliver Neukum, Minchan Kim,
	Greg Kroah-Hartman, Jens Axboe, David S. Miller, Andrew Morton,
	netdev, linux-usb, linux-mm

On Saturday, November 24, 2012 08:59:17 PM Ming Lei wrote:
> This patch applies the introduced memalloc_noio_save() and
> memalloc_noio_restore() to force memory allocation with no I/O
> during runtime_resume/runtime_suspend callback on device with
> the flag of 'memalloc_noio' set.
> 
> Cc: Alan Stern <stern@rowland.harvard.edu>
> Cc: Oliver Neukum <oneukum@suse.de>
> Cc: Rafael J. Wysocki <rjw@sisk.pl>
> Signed-off-by: Ming Lei <ming.lei@canonical.com>
> ---
> v5:
> 	- use inline memalloc_noio_save()
> v4:
> 	- runtime_suspend need this too because rpm_resume may wait for
> 	completion of concurrent runtime_suspend, so deadlock still may
> 	be triggered in runtime_suspend path.
> ---
>  drivers/base/power/runtime.c |   32 ++++++++++++++++++++++++++++++--
>  1 file changed, 30 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
> index 3e198a0..96d99ea 100644
> --- a/drivers/base/power/runtime.c
> +++ b/drivers/base/power/runtime.c
> @@ -371,6 +371,7 @@ static int rpm_suspend(struct device *dev, int rpmflags)
>  	int (*callback)(struct device *);
>  	struct device *parent = NULL;
>  	int retval;
> +	unsigned int noio_flag;
>  
>  	trace_rpm_suspend(dev, rpmflags);
>  
> @@ -480,7 +481,20 @@ static int rpm_suspend(struct device *dev, int rpmflags)
>  	if (!callback && dev->driver && dev->driver->pm)
>  		callback = dev->driver->pm->runtime_suspend;
>  
> -	retval = rpm_callback(callback, dev);
> +	/*
> +	 * Deadlock might be caused if memory allocation with GFP_KERNEL
> +	 * happens inside runtime_suspend callback of one block device's
> +	 * ancestor or the block device itself. Network device might be
> +	 * thought as part of iSCSI block device, so network device and
> +	 * its ancestor should be marked as memalloc_noio.
> +	 */
> +	if (dev->power.memalloc_noio) {
> +		noio_flag = memalloc_noio_save();
> +		retval = rpm_callback(callback, dev);
> +		memalloc_noio_restore(noio_flag);
> +	} else {
> +		retval = rpm_callback(callback, dev);
> +	}
>  	if (retval)
>  		goto fail;
>  
> @@ -563,6 +577,7 @@ static int rpm_resume(struct device *dev, int rpmflags)
>  	int (*callback)(struct device *);
>  	struct device *parent = NULL;
>  	int retval = 0;
> +	unsigned int noio_flag;
>  
>  	trace_rpm_resume(dev, rpmflags);
>  
> @@ -712,7 +727,20 @@ static int rpm_resume(struct device *dev, int rpmflags)
>  	if (!callback && dev->driver && dev->driver->pm)
>  		callback = dev->driver->pm->runtime_resume;
>  
> -	retval = rpm_callback(callback, dev);
> +	/*
> +	 * Deadlock might be caused if memory allocation with GFP_KERNEL
> +	 * happens inside runtime_resume callback of one block device's
> +	 * ancestor or the block device itself. Network device might be
> +	 * thought as part of iSCSI block device, so network device and
> +	 * its ancestor should be marked as memalloc_noio.
> +	 */
> +	if (dev->power.memalloc_noio) {
> +		noio_flag = memalloc_noio_save();
> +		retval = rpm_callback(callback, dev);
> +		memalloc_noio_restore(noio_flag);
> +	} else {
> +		retval = rpm_callback(callback, dev);
> +	}

Please don't duplicate code this way.

You can move that whole thing to rpm_callback().  Yes, you'll probably need to
check dev->power.memalloc_noio twice in there, but that's OK.


>  	if (retval) {
>  		__update_runtime_status(dev, RPM_SUSPENDED);
>  		pm_runtime_cancel_pending(dev);
> 

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v6 2/6] PM / Runtime: introduce pm_runtime_set_memalloc_noio()
  2012-11-27 21:19   ` Rafael J. Wysocki
@ 2012-11-27 21:46     ` Rafael J. Wysocki
  2012-11-28  3:57     ` Ming Lei
  2012-11-28  4:34     ` Ming Lei
  2 siblings, 0 replies; 19+ messages in thread
From: Rafael J. Wysocki @ 2012-11-27 21:46 UTC (permalink / raw)
  To: linux-pm
  Cc: Ming Lei, linux-kernel, Alan Stern, Oliver Neukum, Minchan Kim,
	Greg Kroah-Hartman, Jens Axboe, David S. Miller, Andrew Morton,
	netdev, linux-usb, linux-mm

On Tuesday, November 27, 2012 10:19:29 PM Rafael J. Wysocki wrote:
> On Saturday, November 24, 2012 08:59:14 PM Ming Lei wrote:
> > The patch introduces the flag of memalloc_noio in 'struct dev_pm_info'
> > to help PM core to teach mm not allocating memory with GFP_KERNEL
> > flag for avoiding probable deadlock.
> > 
> > As explained in the comment, any GFP_KERNEL allocation inside
> > runtime_resume() or runtime_suspend() on any one of device in
> > the path from one block or network device to the root device
> > in the device tree may cause deadlock, the introduced
> > pm_runtime_set_memalloc_noio() sets or clears the flag on
> > device in the path recursively.
> > 
> > Cc: Alan Stern <stern@rowland.harvard.edu>
> > Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
> > Signed-off-by: Ming Lei <ming.lei@canonical.com>
> > ---
> > v5:
> > 	- fix code style error
> > 	- add comment on clear the device memalloc_noio flag
> > v4:
> > 	- rename memalloc_noio_resume as memalloc_noio
> > 	- remove pm_runtime_get_memalloc_noio()
> > 	- add comments on pm_runtime_set_memalloc_noio
> > v3:
> > 	- introduce pm_runtime_get_memalloc_noio()
> > 	- hold one global lock on pm_runtime_set_memalloc_noio
> > 	- hold device power lock when accessing memalloc_noio_resume
> > 	  flag suggested by Alan Stern
> > 	- implement pm_runtime_set_memalloc_noio without recursion
> > 	  suggested by Alan Stern
> > v2:
> > 	- introduce pm_runtime_set_memalloc_noio()
> > ---
> >  drivers/base/power/runtime.c |   60 ++++++++++++++++++++++++++++++++++++++++++
> >  include/linux/pm.h           |    1 +
> >  include/linux/pm_runtime.h   |    3 +++
> >  3 files changed, 64 insertions(+)
> > 
> > diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
> > index 3148b10..3e198a0 100644
> > --- a/drivers/base/power/runtime.c
> > +++ b/drivers/base/power/runtime.c
> > @@ -124,6 +124,66 @@ unsigned long pm_runtime_autosuspend_expiration(struct device *dev)
> >  }
> >  EXPORT_SYMBOL_GPL(pm_runtime_autosuspend_expiration);
> >  
> > +static int dev_memalloc_noio(struct device *dev, void *data)
> > +{
> > +	return dev->power.memalloc_noio;
> > +}
> > +
> > +/*
> > + * pm_runtime_set_memalloc_noio - Set a device's memalloc_noio flag.
> > + * @dev: Device to handle.
> > + * @enable: True for setting the flag and False for clearing the flag.
> > + *
> > + * Set the flag for all devices in the path from the device to the
> > + * root device in the device tree if @enable is true, otherwise clear
> > + * the flag for devices in the path whose siblings don't set the flag.
> > + *
> 
> Please use counters instead of walking the whole path every time.  Ie. in
> addition to the flag add a counter to store the number of the device's
> children having that flag set.

I would use the flag only to store the information that
pm_runtime_set_memalloc_noio(dev, true) has been run for this device directly
and I'd use a counter for everything else.

That is, have power.memalloc_count that would be incremented when (1)
pm_runtime_set_memalloc_noio(dev, true) is called for that device and (2) when
power.memalloc_count for one of its children changes from 0 to 1 (and
analogously for decrementation).  Then, check the counter in rpm_callback().

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v6 5/6] PM / Runtime: force memory allocation with no I/O during Runtime PM callbcack
  2012-11-27 21:24   ` Rafael J. Wysocki
@ 2012-11-28  3:06     ` Ming Lei
  0 siblings, 0 replies; 19+ messages in thread
From: Ming Lei @ 2012-11-28  3:06 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-pm, linux-kernel, Alan Stern, Oliver Neukum, Minchan Kim,
	Greg Kroah-Hartman, Jens Axboe, David S. Miller, Andrew Morton,
	netdev, linux-usb, linux-mm

On Wed, Nov 28, 2012 at 5:24 AM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
>
> Please don't duplicate code this way.
>
> You can move that whole thing to rpm_callback().  Yes, you'll probably need to
> check dev->power.memalloc_noio twice in there, but that's OK.

Good idea, I will update it in v7.

Thanks,
--
Ming Lei

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v6 2/6] PM / Runtime: introduce pm_runtime_set_memalloc_noio()
  2012-11-27 21:19   ` Rafael J. Wysocki
  2012-11-27 21:46     ` Rafael J. Wysocki
@ 2012-11-28  3:57     ` Ming Lei
  2012-11-28 10:06       ` Rafael J. Wysocki
  2012-11-28  4:34     ` Ming Lei
  2 siblings, 1 reply; 19+ messages in thread
From: Ming Lei @ 2012-11-28  3:57 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-pm, linux-kernel, Alan Stern, Oliver Neukum, Minchan Kim,
	Greg Kroah-Hartman, Jens Axboe, David S. Miller, Andrew Morton,
	netdev, linux-usb, linux-mm

On Wed, Nov 28, 2012 at 5:19 AM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> On Saturday, November 24, 2012 08:59:14 PM Ming Lei wrote:
>> The patch introduces the flag of memalloc_noio in 'struct dev_pm_info'
>> to help PM core to teach mm not allocating memory with GFP_KERNEL
>> flag for avoiding probable deadlock.
>>
>> As explained in the comment, any GFP_KERNEL allocation inside
>> runtime_resume() or runtime_suspend() on any one of device in
>> the path from one block or network device to the root device
>> in the device tree may cause deadlock, the introduced
>> pm_runtime_set_memalloc_noio() sets or clears the flag on
>> device in the path recursively.
>>
>> Cc: Alan Stern <stern@rowland.harvard.edu>
>> Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
>> Signed-off-by: Ming Lei <ming.lei@canonical.com>
>> ---
>> v5:
>>       - fix code style error
>>       - add comment on clear the device memalloc_noio flag
>> v4:
>>       - rename memalloc_noio_resume as memalloc_noio
>>       - remove pm_runtime_get_memalloc_noio()
>>       - add comments on pm_runtime_set_memalloc_noio
>> v3:
>>       - introduce pm_runtime_get_memalloc_noio()
>>       - hold one global lock on pm_runtime_set_memalloc_noio
>>       - hold device power lock when accessing memalloc_noio_resume
>>         flag suggested by Alan Stern
>>       - implement pm_runtime_set_memalloc_noio without recursion
>>         suggested by Alan Stern
>> v2:
>>       - introduce pm_runtime_set_memalloc_noio()
>> ---
>>  drivers/base/power/runtime.c |   60 ++++++++++++++++++++++++++++++++++++++++++
>>  include/linux/pm.h           |    1 +
>>  include/linux/pm_runtime.h   |    3 +++
>>  3 files changed, 64 insertions(+)
>>
>> diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
>> index 3148b10..3e198a0 100644
>> --- a/drivers/base/power/runtime.c
>> +++ b/drivers/base/power/runtime.c
>> @@ -124,6 +124,66 @@ unsigned long pm_runtime_autosuspend_expiration(struct device *dev)
>>  }
>>  EXPORT_SYMBOL_GPL(pm_runtime_autosuspend_expiration);
>>
>> +static int dev_memalloc_noio(struct device *dev, void *data)
>> +{
>> +     return dev->power.memalloc_noio;
>> +}
>> +
>> +/*
>> + * pm_runtime_set_memalloc_noio - Set a device's memalloc_noio flag.
>> + * @dev: Device to handle.
>> + * @enable: True for setting the flag and False for clearing the flag.
>> + *
>> + * Set the flag for all devices in the path from the device to the
>> + * root device in the device tree if @enable is true, otherwise clear
>> + * the flag for devices in the path whose siblings don't set the flag.
>> + *
>
> Please use counters instead of walking the whole path every time.  Ie. in
> addition to the flag add a counter to store the number of the device's
> children having that flag set.

Thanks for your review.

IMO, pm_runtime_set_memalloc_noio() is only called in
probe() and release() of block device and network device, which is
in a very infrequent path, so I am wondering if it is worthy of introducing
another counter for all devices.

Also looks the current implementation of pm_runtime_set_memalloc_noio()
is simple and clean enough with the flag, IMO.

> I would use the flag only to store the information that
> pm_runtime_set_memalloc_noio(dev, true) has been run for this device directly
> and I'd use a counter for everything else.
>
> That is, have power.memalloc_count that would be incremented when (1)
> pm_runtime_set_memalloc_noio(dev, true) is called for that device and (2) when
> power.memalloc_count for one of its children changes from 0 to 1 (and
> analogously for decrementation).  Then, check the counter in rpm_callback().

Sorry, could you explain in a bit detail why we need the counter? Looks only
checking the flag in rpm_callback() is enough, doesn't it?

>
> Besides, don't you need to check children for the arg device itself?

It isn't needed since the children of network/block device can't be
involved of the deadlock in runtime PM path.

Also, the function is only called by network device or block device
subsystem, both the two kind of device are class device and should
have no children.

>
>> + * The function should only be called by block device, or network
>> + * device driver for solving the deadlock problem during runtime
>> + * resume/suspend:
>> + *
>> + *     If memory allocation with GFP_KERNEL is called inside runtime
>> + *     resume/suspend callback of any one of its ancestors(or the
>> + *     block device itself), the deadlock may be triggered inside the
>> + *     memory allocation since it might not complete until the block
>> + *     device becomes active and the involed page I/O finishes. The
>> + *     situation is pointed out first by Alan Stern. Network device
>> + *     are involved in iSCSI kind of situation.
>> + *
>> + * The lock of dev_hotplug_mutex is held in the function for handling
>> + * hotplug race because pm_runtime_set_memalloc_noio() may be called
>> + * in async probe().
>> + *
>> + * The function should be called between device_add() and device_del()
>> + * on the affected device(block/network device).
>> + */
>> +void pm_runtime_set_memalloc_noio(struct device *dev, bool enable)
>> +{
>> +     static DEFINE_MUTEX(dev_hotplug_mutex);
>
> What's the mutex for?

It is for avoiding hotplug race, for example, without the mutex,
another child may set the flag between the time device_for_each_child()
runs and the next loop iteration in pm_runtime_set_memalloc_noio(false).

Thanks,
--
Ming Lei

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v6 2/6] PM / Runtime: introduce pm_runtime_set_memalloc_noio()
  2012-11-27 21:19   ` Rafael J. Wysocki
  2012-11-27 21:46     ` Rafael J. Wysocki
  2012-11-28  3:57     ` Ming Lei
@ 2012-11-28  4:34     ` Ming Lei
  2012-11-28  9:29       ` Rafael J. Wysocki
  2 siblings, 1 reply; 19+ messages in thread
From: Ming Lei @ 2012-11-28  4:34 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-pm, linux-kernel, Alan Stern, Oliver Neukum, Minchan Kim,
	Greg Kroah-Hartman, Jens Axboe, David S. Miller, Andrew Morton,
	netdev, linux-usb, linux-mm

On Wed, Nov 28, 2012 at 5:19 AM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
>
> Please use counters instead of walking the whole path every time.  Ie. in
> addition to the flag add a counter to store the number of the device's
> children having that flag set.

Even though counter is added, walking the whole path can't be avoided too,
and may be a explicit walking or recursion, because pm_runtime_set_memalloc_noio
is required to set or clear the flag(or increase/decrease the counter) of
devices in the whole path.

Thanks,
--
Ming Lei

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v6 2/6] PM / Runtime: introduce pm_runtime_set_memalloc_noio()
  2012-11-28  4:34     ` Ming Lei
@ 2012-11-28  9:29       ` Rafael J. Wysocki
  2012-11-28  9:47         ` Ming Lei
  0 siblings, 1 reply; 19+ messages in thread
From: Rafael J. Wysocki @ 2012-11-28  9:29 UTC (permalink / raw)
  To: Ming Lei
  Cc: linux-pm, linux-kernel, Alan Stern, Oliver Neukum, Minchan Kim,
	Greg Kroah-Hartman, Jens Axboe, David S. Miller, Andrew Morton,
	netdev, linux-usb, linux-mm

On Wednesday, November 28, 2012 12:34:36 PM Ming Lei wrote:
> On Wed, Nov 28, 2012 at 5:19 AM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> >
> > Please use counters instead of walking the whole path every time.  Ie. in
> > addition to the flag add a counter to store the number of the device's
> > children having that flag set.
> 
> Even though counter is added, walking the whole path can't be avoided too,
> and may be a explicit walking or recursion, because pm_runtime_set_memalloc_noio
> is required to set or clear the flag(or increase/decrease the counter) of
> devices in the whole path.

But it doesn't have to walk the children.  Moreover, with counters it only
needs to walk the whole path if all devices in it need to be updated.  For
example, if you call pm_runtime_set_memalloc_noio(dev, true) for a device
whose parent's counter is greater than zero already, you don't need to
walk the path above the parent.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v6 2/6] PM / Runtime: introduce pm_runtime_set_memalloc_noio()
  2012-11-28  9:29       ` Rafael J. Wysocki
@ 2012-11-28  9:47         ` Ming Lei
  2012-11-28 10:15           ` Rafael J. Wysocki
  0 siblings, 1 reply; 19+ messages in thread
From: Ming Lei @ 2012-11-28  9:47 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-pm, linux-kernel, Alan Stern, Oliver Neukum, Minchan Kim,
	Greg Kroah-Hartman, Jens Axboe, David S. Miller, Andrew Morton,
	netdev, linux-usb, linux-mm

On Wed, Nov 28, 2012 at 5:29 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
>
> But it doesn't have to walk the children.  Moreover, with counters it only

Yeah, I got it, it is the advantage of counter, but with extra 'int'
field introduced
in 'struct device'.

> needs to walk the whole path if all devices in it need to be updated.  For
> example, if you call pm_runtime_set_memalloc_noio(dev, true) for a device
> whose parent's counter is greater than zero already, you don't need to
> walk the path above the parent.

We still can do it with the flag only, pm_runtime_set_memalloc_noio(dev, true)
can return immediately if one parent or the 'dev' flag is true.

But considered that the pm_runtime_set_memalloc_noio(dev, false) is only
called in a very infrequent path(network/block device->remove()), looks the
introduced cost isn't worthy of the obtained advantage.

So could you accept not introducing counter? and I will update with the
above improvement you suggested.

Thanks,
--
Ming Lei

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v6 2/6] PM / Runtime: introduce pm_runtime_set_memalloc_noio()
  2012-11-28  3:57     ` Ming Lei
@ 2012-11-28 10:06       ` Rafael J. Wysocki
  2012-11-28 13:51         ` Ming Lei
  0 siblings, 1 reply; 19+ messages in thread
From: Rafael J. Wysocki @ 2012-11-28 10:06 UTC (permalink / raw)
  To: Ming Lei
  Cc: linux-pm, linux-kernel, Alan Stern, Oliver Neukum, Minchan Kim,
	Greg Kroah-Hartman, Jens Axboe, David S. Miller, Andrew Morton,
	netdev, linux-usb, linux-mm

On Wednesday, November 28, 2012 11:57:19 AM Ming Lei wrote:
> On Wed, Nov 28, 2012 at 5:19 AM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> > On Saturday, November 24, 2012 08:59:14 PM Ming Lei wrote:
> >> The patch introduces the flag of memalloc_noio in 'struct dev_pm_info'
> >> to help PM core to teach mm not allocating memory with GFP_KERNEL
> >> flag for avoiding probable deadlock.
> >>
> >> As explained in the comment, any GFP_KERNEL allocation inside
> >> runtime_resume() or runtime_suspend() on any one of device in
> >> the path from one block or network device to the root device
> >> in the device tree may cause deadlock, the introduced
> >> pm_runtime_set_memalloc_noio() sets or clears the flag on
> >> device in the path recursively.
> >>
> >> Cc: Alan Stern <stern@rowland.harvard.edu>
> >> Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
> >> Signed-off-by: Ming Lei <ming.lei@canonical.com>
> >> ---
> >> v5:
> >>       - fix code style error
> >>       - add comment on clear the device memalloc_noio flag
> >> v4:
> >>       - rename memalloc_noio_resume as memalloc_noio
> >>       - remove pm_runtime_get_memalloc_noio()
> >>       - add comments on pm_runtime_set_memalloc_noio
> >> v3:
> >>       - introduce pm_runtime_get_memalloc_noio()
> >>       - hold one global lock on pm_runtime_set_memalloc_noio
> >>       - hold device power lock when accessing memalloc_noio_resume
> >>         flag suggested by Alan Stern
> >>       - implement pm_runtime_set_memalloc_noio without recursion
> >>         suggested by Alan Stern
> >> v2:
> >>       - introduce pm_runtime_set_memalloc_noio()
> >> ---
> >>  drivers/base/power/runtime.c |   60 ++++++++++++++++++++++++++++++++++++++++++
> >>  include/linux/pm.h           |    1 +
> >>  include/linux/pm_runtime.h   |    3 +++
> >>  3 files changed, 64 insertions(+)
> >>
> >> diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
> >> index 3148b10..3e198a0 100644
> >> --- a/drivers/base/power/runtime.c
> >> +++ b/drivers/base/power/runtime.c
> >> @@ -124,6 +124,66 @@ unsigned long pm_runtime_autosuspend_expiration(struct device *dev)
> >>  }
> >>  EXPORT_SYMBOL_GPL(pm_runtime_autosuspend_expiration);
> >>
> >> +static int dev_memalloc_noio(struct device *dev, void *data)
> >> +{
> >> +     return dev->power.memalloc_noio;
> >> +}
> >> +
> >> +/*
> >> + * pm_runtime_set_memalloc_noio - Set a device's memalloc_noio flag.
> >> + * @dev: Device to handle.
> >> + * @enable: True for setting the flag and False for clearing the flag.
> >> + *
> >> + * Set the flag for all devices in the path from the device to the
> >> + * root device in the device tree if @enable is true, otherwise clear
> >> + * the flag for devices in the path whose siblings don't set the flag.
> >> + *
> >
> > Please use counters instead of walking the whole path every time.  Ie. in
> > addition to the flag add a counter to store the number of the device's
> > children having that flag set.
> 
> Thanks for your review.
> 
> IMO, pm_runtime_set_memalloc_noio() is only called in
> probe() and release() of block device and network device, which is
> in a very infrequent path, so I am wondering if it is worthy of introducing
> another counter for all devices.

Well, it may be unfrequent, but does it mean it has to do things that may
be avoided (ie. walking the children of every node in the path in some cases)?

I don't really think that the counters would cost us that much anyway.

> Also looks the current implementation of pm_runtime_set_memalloc_noio()
> is simple and clean enough with the flag, IMO.

I know you always know better. :-)

> > I would use the flag only to store the information that
> > pm_runtime_set_memalloc_noio(dev, true) has been run for this device directly
> > and I'd use a counter for everything else.
> >
> > That is, have power.memalloc_count that would be incremented when (1)
> > pm_runtime_set_memalloc_noio(dev, true) is called for that device and (2) when
> > power.memalloc_count for one of its children changes from 0 to 1 (and
> > analogously for decrementation).  Then, check the counter in rpm_callback().
> 
> Sorry, could you explain in a bit detail why we need the counter? Looks only
> checking the flag in rpm_callback() is enough, doesn't it?

Why would I want to use power.memalloc_count in addition to the
power.memalloc_noio flag?

Consider this:

pm_runtime_set_memalloc_noio(dev):
	return if power.memalloc_noio is set
	set power.memalloc_noio
  loop:
	increment power.memalloc_count
	if power.memalloc_count is 1 now switch to parent and go to loop

pm_runtime_clear_memalloc_noio(dev):
	return if power.memalloc_noio is unset
	unset power.memalloc_noio
  loop:
	decrement power.memalloc_count
	if power.memalloc_count is 0 now switch to parent and go to loop

Looks kind of simpler, doesn't it?

And why rpm_callback() should check power.memalloc_count instead of the count?
Because power.memalloc_noio will only be set for devices that
pm_runtime_set_memalloc_noio(dev) was called for directly (not necessarily for
the parents).

And that works even if someone calls any of them twice in a row for the same
device (presumably by mistake) and doesn't have to make any assumptions
about devices it is called for.

> > Besides, don't you need to check children for the arg device itself?
> 
> It isn't needed since the children of network/block device can't be
> involved of the deadlock in runtime PM path.
> 
> Also, the function is only called by network device or block device
> subsystem, both the two kind of device are class device and should
> have no children.

OK, so not walking the arg device's children is an optimization related to
some assumptions regarding who's supposed to use this routine.  That should
be clearly documented.

However, I'd prefer it not to make such assumptions in the first place.

> >> + * The function should only be called by block device, or network
> >> + * device driver for solving the deadlock problem during runtime
> >> + * resume/suspend:
> >> + *
> >> + *     If memory allocation with GFP_KERNEL is called inside runtime
> >> + *     resume/suspend callback of any one of its ancestors(or the
> >> + *     block device itself), the deadlock may be triggered inside the
> >> + *     memory allocation since it might not complete until the block
> >> + *     device becomes active and the involed page I/O finishes. The
> >> + *     situation is pointed out first by Alan Stern. Network device
> >> + *     are involved in iSCSI kind of situation.
> >> + *
> >> + * The lock of dev_hotplug_mutex is held in the function for handling
> >> + * hotplug race because pm_runtime_set_memalloc_noio() may be called
> >> + * in async probe().
> >> + *
> >> + * The function should be called between device_add() and device_del()
> >> + * on the affected device(block/network device).
> >> + */
> >> +void pm_runtime_set_memalloc_noio(struct device *dev, bool enable)
> >> +{
> >> +     static DEFINE_MUTEX(dev_hotplug_mutex);
> >
> > What's the mutex for?
> 
> It is for avoiding hotplug race, for example, without the mutex,
> another child may set the flag between the time device_for_each_child()
> runs and the next loop iteration in pm_runtime_set_memalloc_noio(false).

OK

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v6 2/6] PM / Runtime: introduce pm_runtime_set_memalloc_noio()
  2012-11-28  9:47         ` Ming Lei
@ 2012-11-28 10:15           ` Rafael J. Wysocki
  0 siblings, 0 replies; 19+ messages in thread
From: Rafael J. Wysocki @ 2012-11-28 10:15 UTC (permalink / raw)
  To: Ming Lei
  Cc: linux-pm, linux-kernel, Alan Stern, Oliver Neukum, Minchan Kim,
	Greg Kroah-Hartman, Jens Axboe, David S. Miller, Andrew Morton,
	netdev, linux-usb, linux-mm

On Wednesday, November 28, 2012 05:47:18 PM Ming Lei wrote:
> On Wed, Nov 28, 2012 at 5:29 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> >
> > But it doesn't have to walk the children.  Moreover, with counters it only
> 
> Yeah, I got it, it is the advantage of counter, but with extra 'int'
> field introduced
> in 'struct device'.
> 
> > needs to walk the whole path if all devices in it need to be updated.  For
> > example, if you call pm_runtime_set_memalloc_noio(dev, true) for a device
> > whose parent's counter is greater than zero already, you don't need to
> > walk the path above the parent.
> 
> We still can do it with the flag only, pm_runtime_set_memalloc_noio(dev, true)
> can return immediately if one parent or the 'dev' flag is true.
> 
> But considered that the pm_runtime_set_memalloc_noio(dev, false) is only
> called in a very infrequent path(network/block device->remove()), looks the
> introduced cost isn't worthy of the obtained advantage.
> 
> So could you accept not introducing counter? and I will update with the
> above improvement you suggested.

Well, please see my other message I sent a while ago. :-)

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v6 2/6] PM / Runtime: introduce pm_runtime_set_memalloc_noio()
  2012-11-28 10:06       ` Rafael J. Wysocki
@ 2012-11-28 13:51         ` Ming Lei
  0 siblings, 0 replies; 19+ messages in thread
From: Ming Lei @ 2012-11-28 13:51 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-pm, linux-kernel, Alan Stern, Oliver Neukum, Minchan Kim,
	Greg Kroah-Hartman, Jens Axboe, David S. Miller, Andrew Morton,
	netdev, linux-usb, linux-mm

On Wed, Nov 28, 2012 at 6:06 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
>
> Well, it may be unfrequent, but does it mean it has to do things that may
> be avoided (ie. walking the children of every node in the path in some cases)?

I agree so without introducing extra cost, :-)

> I don't really think that the counters would cost us that much anyway.

On ARM v7, sizeof(struct device) becomes 376 from 368 after introducing
'unsigned int            noio_cnt;' to 'struct dev_pm_info', and total memory
increases about 3752bytes in a small configuration(about 494 device instance).
The actual memory increase should be more than the data because 'struct device'
is generally embedded into other concrete device structure.

>> Also looks the current implementation of pm_runtime_set_memalloc_noio()
>> is simple and clean enough with the flag, IMO.
>
> I know you always know better. :-)

We still need to consider cost and the function calling frequency, :-)

>
>> > I would use the flag only to store the information that
>> > pm_runtime_set_memalloc_noio(dev, true) has been run for this device directly
>> > and I'd use a counter for everything else.
>> >
>> > That is, have power.memalloc_count that would be incremented when (1)
>> > pm_runtime_set_memalloc_noio(dev, true) is called for that device and (2) when
>> > power.memalloc_count for one of its children changes from 0 to 1 (and
>> > analogously for decrementation).  Then, check the counter in rpm_callback().
>>
>> Sorry, could you explain in a bit detail why we need the counter? Looks only
>> checking the flag in rpm_callback() is enough, doesn't it?
>
> Why would I want to use power.memalloc_count in addition to the
> power.memalloc_noio flag?
>
> Consider this:
>
> pm_runtime_set_memalloc_noio(dev):
>         return if power.memalloc_noio is set
>         set power.memalloc_noio
>   loop:
>         increment power.memalloc_count
>         if power.memalloc_count is 1 now switch to parent and go to loop

I am wondering if the above should be changed to below because the child
count of memalloc_noio device need to be recorded.

pm_runtime_set_memalloc_noio(dev):
         return if power.memalloc_noio is set
         set power.memalloc_noio
loop:
         increment power.memalloc_count
         switch to parent and go to loop

So pm_runtime_set_memalloc_noio(dev) will become worse than
the improved pm_runtime_set_memalloc_noio(dev, true), which
can return immediately if one dev or parent's flag is true.

> pm_runtime_clear_memalloc_noio(dev):
>         return if power.memalloc_noio is unset
>         unset power.memalloc_noio
>   loop:
>         decrement power.memalloc_count
>         if power.memalloc_count is 0 now switch to parent and go to loop

The above will perform well than pm_runtime_set_memalloc_noio(dev, false),
because the above avoids to walk children of device.

So one becomes worse and another becomes better, :-)

Also the children count of one device is generally very small, less than
10 for most devices, see the data obtained in one common x86 pc(thinkpad
t410) from below link:

        http://kernel.ubuntu.com/~ming/up/t410-dev-child-cnt.log

- about 8 devices whose child count is more than 10, top three are 18, 17 ,12,
and all the three are root devices.

- about 117 devices whose child count is between 1 and 9

- other 501 devices whose child count is zero

>From above data, walking device children should have not much effect on
performance of pm_runtime_set_memalloc_noio(), which is also called in
very infrequent path.

> Looks kind of simpler, doesn't it?

Looks simpler, but more code lines than single
pm_runtime_set_memalloc_noio(), :-)

>
> And why rpm_callback() should check power.memalloc_count instead of the count?
> Because power.memalloc_noio will only be set for devices that
> pm_runtime_set_memalloc_noio(dev) was called for directly (not necessarily for
> the parents).
>
> And that works even if someone calls any of them twice in a row for the same
> device (presumably by mistake) and doesn't have to make any assumptions
> about devices it is called for.

IMO, we can ignore the mistake usage because the function is called only
in network/block core code currently, not by individual driver.

>
>> > Besides, don't you need to check children for the arg device itself?
>>
>> It isn't needed since the children of network/block device can't be
>> involved of the deadlock in runtime PM path.
>>
>> Also, the function is only called by network device or block device
>> subsystem, both the two kind of device are class device and should
>> have no children.
>
> OK, so not walking the arg device's children is an optimization related to
> some assumptions regarding who's supposed to use this routine.  That should
> be clearly documented.

I think the patch already documents it in the comment of
pm_runtime_set_memalloc_noio().

Thanks,
--
Ming Lei

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2012-11-28 13:51 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-11-24 12:59 [PATCH v6 0/6] solve deadlock caused by memory allocation with I/O Ming Lei
2012-11-24 12:59 ` [PATCH v6 1/6] mm: teach mm by current context info to not do I/O during memory allocation Ming Lei
2012-11-24 12:59 ` [PATCH v6 2/6] PM / Runtime: introduce pm_runtime_set_memalloc_noio() Ming Lei
2012-11-27 21:19   ` Rafael J. Wysocki
2012-11-27 21:46     ` Rafael J. Wysocki
2012-11-28  3:57     ` Ming Lei
2012-11-28 10:06       ` Rafael J. Wysocki
2012-11-28 13:51         ` Ming Lei
2012-11-28  4:34     ` Ming Lei
2012-11-28  9:29       ` Rafael J. Wysocki
2012-11-28  9:47         ` Ming Lei
2012-11-28 10:15           ` Rafael J. Wysocki
2012-11-24 12:59 ` [PATCH v6 3/6] block/genhd.c: apply pm_runtime_set_memalloc_noio on block devices Ming Lei
2012-11-24 12:59 ` [PATCH v6 4/6] net/core: apply pm_runtime_set_memalloc_noio on network devices Ming Lei
2012-11-24 12:59 ` [PATCH v6 5/6] PM / Runtime: force memory allocation with no I/O during Runtime PM callbcack Ming Lei
2012-11-27 21:24   ` Rafael J. Wysocki
2012-11-28  3:06     ` Ming Lei
2012-11-24 12:59 ` [PATCH v6 6/6] USB: forbid memory allocation with I/O during bus reset Ming Lei
2012-11-27 20:25 ` [PATCH v6 0/6] solve deadlock caused by memory allocation with I/O Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).