linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/6] solve deadlock caused by memory allocation with I/O
@ 2012-10-29 12:23 Ming Lei
  2012-10-29 12:23 ` [PATCH v3 1/6] mm: teach mm by current context info to not do I/O during memory allocation Ming Lei
                   ` (5 more replies)
  0 siblings, 6 replies; 23+ messages in thread
From: Ming Lei @ 2012-10-29 12:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: Alan Stern, Oliver Neukum, Minchan Kim, Greg Kroah-Hartman,
	Rafael J. Wysocki, Jens Axboe, David S. Miller, Andrew Morton,
	netdev, linux-usb, linux-pm, linux-mm

This patchset try to solve one deadlock problem which might be caused
by memory allocation with block I/O during runtime resume and block device
error handling path. Traditionly, the problem is addressed by passing
GFP_NOIO statically to mm, but that is not a effective solution, see
detailed description in patch 1's commit log.

This patch set introduces one process flag and trys to fix one deadlock
problem on block device/network device during runtime resume or usb bus reset.

The 1st one is the change on include/sched.h and mm.

The 2nd patch introduces the flag of memalloc_noio_resume on 'dev_pm_info',
and pm_runtime_set_memalloc_noio(), so that PM Core can teach mm to not
allocate mm with GFP_IOFS during the runtime_resume callback only on
device with the flag set.

The following 2 patches apply the introduced pm_runtime_set_memalloc_noio()
to mark all devices as memalloc_noio_resume in the path from the block or
network device to the root device in device tree.

The last 2 patches are applied again PM and USB subsystem to demonstrate
how to use the introduced mechanism to fix the deadlock problem.

V3:
	- patch 2/6 and 5/6 changed, see their commit log
	- remove RFC from title since several guys have expressed that
	it is a reasonable solution
V2:
        - remove changes on 'may_writepage' and 'may_swap'(1/6)
        - unset GFP_IOFS in try_to_free_pages() path(1/6)
        - introduce pm_runtime_set_memalloc_noio()
        - only apply the meachnism on block/network device and its ancestors
        for runtime resume context
V1:
        - take Minchan's change to avoid the check in alloc_page hot path
        - change the helpers' style into save/restore as suggested by Alan
        - memory allocation with no io in usb bus reset path for all devices
        as suggested by Greg and Oliver

 block/genhd.c                |    8 ++++
 drivers/base/power/runtime.c |   88 +++++++++++++++++++++++++++++++++++++++++-
 drivers/usb/core/hub.c       |   15 +++++++
 include/linux/pm.h           |    1 +
 include/linux/pm_runtime.h   |    5 +++
 include/linux/sched.h        |   10 +++++
 mm/page_alloc.c              |   10 ++++-
 mm/vmscan.c                  |   12 ++++++
 net/core/net-sysfs.c         |    5 +++
 9 files changed, 152 insertions(+), 2 deletions(-)


Thanks,
--
Ming Lei


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v3 1/6] mm: teach mm by current context info to not do I/O during memory allocation
  2012-10-29 12:23 [PATCH v3 0/6] solve deadlock caused by memory allocation with I/O Ming Lei
@ 2012-10-29 12:23 ` Ming Lei
  2012-10-29 12:23 ` [PATCH v3 2/6] PM / Runtime: introduce pm_runtime_set[get]_memalloc_noio() Ming Lei
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 23+ messages in thread
From: Ming Lei @ 2012-10-29 12:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: Alan Stern, Oliver Neukum, Minchan Kim, Greg Kroah-Hartman,
	Rafael J. Wysocki, Jens Axboe, David S. Miller, Andrew Morton,
	netdev, linux-usb, linux-pm, linux-mm, Ming Lei, Jiri Kosina,
	Mel Gorman, KAMEZAWA Hiroyuki, Michal Hocko, Ingo Molnar,
	Peter Zijlstra

This patch introduces PF_MEMALLOC_NOIO on process flag('flags' field of
'struct task_struct'), so that the flag can be set by one task
to avoid doing I/O inside memory allocation in the task's context.

The patch trys to solve one deadlock problem caused by block device,
and the problem may happen at least in the below situations:

- during block device runtime resume, if memory allocation with
GFP_KERNEL is called inside runtime resume callback of any one
of its ancestors(or the block device itself), the deadlock may be
triggered inside the memory allocation since it might not complete
until the block device becomes active and the involed page I/O finishes.
The situation is pointed out first by Alan Stern. It is not a good
approach to convert all GFP_KERNEL[1] in the path into GFP_NOIO because
several subsystems may be involved(for example, PCI, USB and SCSI may
be involved for usb mass stoarage device, network devices involved too
in the iSCSI case)

- during error handling of usb mass storage deivce, USB bus reset
will be put on the device, so there shouldn't have any
memory allocation with GFP_KERNEL during USB bus reset, otherwise
the deadlock similar with above may be triggered. Unfortunately, any
usb device may include one mass storage interface in theory, so it
requires all usb interface drivers to handle the situation. In fact,
most usb drivers don't know how to handle bus reset on the device
and don't provide .pre_set() and .post_reset() callback at all, so
USB core has to unbind and bind driver for these devices. So it
is still not practical to resort to GFP_NOIO for solving the problem.

Also the introduced solution can be used by block subsystem or block
drivers too, for example, set the PF_MEMALLOC_NOIO flag before doing
actual I/O transfer.

It is not a good idea to convert all these GFP_KERNEL in the
affected path into GFP_NOIO because these functions doing that may be
implemented as library and will be called in many other contexts.

In fact, memalloc_noio() can convert some of current static GFP_NOIO
allocation into GFP_KERNEL back in other non-affected contexts, at least
almost all GFP_NOIO in USB subsystem can be converted into GFP_KERNEL
after applying the approach and make allocation with GFP_IO
only happen in runtime resume/bus reset/block I/O transfer contexts
generally.

[1], several GFP_KERNEL allocation examples in runtime resume path

- pci subsystem
acpi_os_allocate
	<-acpi_ut_allocate
		<-ACPI_ALLOCATE_ZEROED
			<-acpi_evaluate_object
				<-__acpi_bus_set_power
					<-acpi_bus_set_power
						<-acpi_pci_set_power_state
							<-platform_pci_set_power_state
								<-pci_platform_power_transition
									<-__pci_complete_power_transition
										<-pci_set_power_state
											<-pci_restore_standard_config
												<-pci_pm_runtime_resume
- usb subsystem
usb_get_status
	<-finish_port_resume
		<-usb_port_resume
			<-generic_resume
				<-usb_resume_device
					<-usb_resume_both
						<-usb_runtime_resume

- some individual usb drivers
usblp, uvc, gspca, most of dvb-usb-v2 media drivers, cpia2, az6007, ....

That is just what I have found.  Unfortunately, this allocation can
only be found by human being now, and there should be many not found
since any function in the resume path(call tree) may allocate memory
with GFP_KERNEL.

Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Oliver Neukum <oneukum@suse.de>
Cc: Jiri Kosina <jiri.kosina@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Ming Lei <ming.lei@canonical.com>

---
v3:
	- no change
v2:
        - remove changes on 'may_writepage' and 'may_swap' because that
          isn't related with the patchset, and can't introduce I/O in
          allocation path if GFP_IOFS is unset, so handing 'may_swap'
          and may_writepage on GFP_NOIO or GFP_NOFS  should be a
          mm internal thing, and let mm guys deal with that, :-).

          Looks clearing the two may_XXX flag only excludes dirty pages
	  and anon pages for relaiming, and the behaviour should be decided
          by GFP FLAG, IMO.

        - unset GFP_IOFS in try_to_free_pages() path since
          alloc_page_buffers()
          and dma_alloc_from_contiguous may drop into the path, as
          pointed by KAMEZAWA Hiroyuki
v1:
        - take Minchan's change to avoid the check in alloc_page hot
          path

        - change the helpers' style into save/restore as suggested by
          Alan Stern
---
 include/linux/sched.h |   10 ++++++++++
 mm/page_alloc.c       |   10 +++++++++-
 mm/vmscan.c           |   12 ++++++++++++
 3 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index fb27acd..283fe86 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1805,6 +1805,7 @@ extern void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t *
 #define PF_FROZEN	0x00010000	/* frozen for system suspend */
 #define PF_FSTRANS	0x00020000	/* inside a filesystem transaction */
 #define PF_KSWAPD	0x00040000	/* I am kswapd */
+#define PF_MEMALLOC_NOIO 0x00080000	/* Allocating memory without IO involved */
 #define PF_LESS_THROTTLE 0x00100000	/* Throttle me less: I clean memory */
 #define PF_KTHREAD	0x00200000	/* I am a kernel thread */
 #define PF_RANDOMIZE	0x00400000	/* randomize virtual address space */
@@ -1842,6 +1843,15 @@ extern void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t *
 #define tsk_used_math(p) ((p)->flags & PF_USED_MATH)
 #define used_math() tsk_used_math(current)
 
+#define memalloc_noio() (current->flags & PF_MEMALLOC_NOIO)
+#define memalloc_noio_save(flag) do { \
+	(flag) = current->flags & PF_MEMALLOC_NOIO; \
+	current->flags |= PF_MEMALLOC_NOIO; \
+} while (0)
+#define memalloc_noio_restore(flag) do { \
+	current->flags = (current->flags & ~PF_MEMALLOC_NOIO) | flag; \
+} while (0)
+
 /*
  * task->jobctl flags
  */
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 45c916b..548d41c 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2634,10 +2634,18 @@ retry_cpuset:
 	page = get_page_from_freelist(gfp_mask|__GFP_HARDWALL, nodemask, order,
 			zonelist, high_zoneidx, alloc_flags,
 			preferred_zone, migratetype);
-	if (unlikely(!page))
+	if (unlikely(!page)) {
+		/*
+		 * Resume, block IO and its error handling path
+		 * can deadlock because I/O on the device might not
+		 * complete.
+		 */
+		if (unlikely(memalloc_noio()))
+			gfp_mask &= ~GFP_IOFS;
 		page = __alloc_pages_slowpath(gfp_mask, order,
 				zonelist, high_zoneidx, nodemask,
 				preferred_zone, migratetype);
+	}
 
 	trace_mm_page_alloc(page, order, gfp_mask, migratetype);
 
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 10090c8..035088a 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2304,6 +2304,12 @@ unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
 		.gfp_mask = sc.gfp_mask,
 	};
 
+	if (unlikely(memalloc_noio())) {
+		gfp_mask &= ~GFP_IOFS;
+		sc.gfp_mask = gfp_mask;
+		shrink.gfp_mask = sc.gfp_mask;
+	}
+
 	throttle_direct_reclaim(gfp_mask, zonelist, nodemask);
 
 	/*
@@ -3304,6 +3310,12 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
 	};
 	unsigned long nr_slab_pages0, nr_slab_pages1;
 
+	if (unlikely(memalloc_noio())) {
+		gfp_mask &= ~GFP_IOFS;
+		sc.gfp_mask = gfp_mask;
+		shrink.gfp_mask = sc.gfp_mask;
+	}
+
 	cond_resched();
 	/*
 	 * We need to be able to allocate from the reserves for RECLAIM_SWAP
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 2/6] PM / Runtime: introduce pm_runtime_set[get]_memalloc_noio()
  2012-10-29 12:23 [PATCH v3 0/6] solve deadlock caused by memory allocation with I/O Ming Lei
  2012-10-29 12:23 ` [PATCH v3 1/6] mm: teach mm by current context info to not do I/O during memory allocation Ming Lei
@ 2012-10-29 12:23 ` Ming Lei
  2012-10-29 15:41   ` Alan Stern
  2012-10-29 12:23 ` [PATCH v3 3/6] block/genhd.c: apply pm_runtime_set_memalloc_noio on block devices Ming Lei
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 23+ messages in thread
From: Ming Lei @ 2012-10-29 12:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: Alan Stern, Oliver Neukum, Minchan Kim, Greg Kroah-Hartman,
	Rafael J. Wysocki, Jens Axboe, David S. Miller, Andrew Morton,
	netdev, linux-usb, linux-pm, linux-mm, Ming Lei

The patch introduces the flag of memalloc_noio_resume in
'struct dev_pm_info' to help PM core to teach mm not allocating
memory with GFP_KERNEL flag for avoiding probable deadlock
problem.

As explained in the comment, any GFP_KERNEL allocation inside
runtime_resume on any one of device in the path from one block
or network device to the root device in the device tree may cause
deadlock, the introduced pm_runtime_set_memalloc_noio() sets or
clears the flag on device of the path recursively.

This patch also introduces pm_runtime_get_memalloc_noio() because
the flag may be accessed in block device's error handling path
(for example, usb device reset)

Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
---
v3:
	- introduce pm_runtime_get_memalloc_noio()
	- hold one global lock on pm_runtime_set_memalloc_noio
	- hold device power lock when accessing memalloc_noio_resume
	  flag suggested by Alan Stern
	- implement pm_runtime_set_memalloc_noio without recursion
	  suggested by Alan Stern
v2:
	- introduce pm_runtime_set_memalloc_noio()
---
 drivers/base/power/runtime.c |   72 ++++++++++++++++++++++++++++++++++++++++++
 include/linux/pm.h           |    1 +
 include/linux/pm_runtime.h   |    5 +++
 3 files changed, 78 insertions(+)

diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
index 3148b10..9fa6ea7 100644
--- a/drivers/base/power/runtime.c
+++ b/drivers/base/power/runtime.c
@@ -124,6 +124,78 @@ unsigned long pm_runtime_autosuspend_expiration(struct device *dev)
 }
 EXPORT_SYMBOL_GPL(pm_runtime_autosuspend_expiration);
 
+/*
+ * pm_runtime_get_memalloc_noio - Get a device's memalloc_noio flag.
+ * @dev: Device to handle.
+ *
+ * Return the device's memalloc_noio flag.
+ *
+ * The device power lock is held because bitfield is not SMP-safe.
+ */
+bool pm_runtime_get_memalloc_noio(struct device *dev)
+{
+	bool ret;
+	spin_lock_irq(&dev->power.lock);
+	ret = dev->power.memalloc_noio_resume;
+	spin_unlock_irq(&dev->power.lock);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(pm_runtime_get_memalloc_noio);
+
+static int dev_memalloc_noio(struct device *dev, void *data)
+{
+	return pm_runtime_get_memalloc_noio(dev);
+}
+
+/*
+ * pm_runtime_set_memalloc_noio - Set a device's memalloc_noio flag.
+ * @dev: Device to handle.
+ * @enable: True for setting the flag and False for clearing the flag.
+ *
+ * Set the flag for all devices in the path from the device to the
+ * root device in the device tree if @enable is true, otherwise clear
+ * the flag for devices in the path which sibliings don't set the flag.
+ *
+ * The function should only be called by block device, or network
+ * device driver for solving the deadlock problem during runtime
+ * resume:
+ * 	if memory allocation with GFP_KERNEL is called inside runtime
+ * 	resume callback of any one of its ancestors(or the block device
+ * 	itself), the deadlock may be triggered inside the memory
+ * 	allocation since it might not complete until the block device
+ * 	becomes active and the involed page I/O finishes. The situation
+ * 	is pointed out first by Alan Stern. Network device are involved
+ * 	in iSCSI kind of situation.
+ *
+ * The lock of dev_hotplug_mutex is held in the function for handling
+ * hotplug race because pm_runtime_set_memalloc_noio() may be called
+ * in async probe().
+ */
+void pm_runtime_set_memalloc_noio(struct device *dev, bool enable)
+{
+	static DEFINE_MUTEX(dev_hotplug_mutex);
+
+	mutex_lock(&dev_hotplug_mutex);
+	while (dev) {
+		/* hold power lock since bitfield is not SMP-safe. */
+		spin_lock_irq(&dev->power.lock);
+		dev->power.memalloc_noio_resume = enable;
+		spin_unlock_irq(&dev->power.lock);
+
+		dev = dev->parent;
+
+		/* only clear the flag for one device if all
+		 * children of the device don't set the flag.
+		 */
+		if (!dev || (!enable &&
+			     device_for_each_child(dev, NULL,
+						   dev_memalloc_noio)))
+			break;
+	}
+	mutex_unlock(&dev_hotplug_mutex);
+}
+EXPORT_SYMBOL_GPL(pm_runtime_set_memalloc_noio);
+
 /**
  * rpm_check_suspend_allowed - Test whether a device may be suspended.
  * @dev: Device to test.
diff --git a/include/linux/pm.h b/include/linux/pm.h
index 03d7bb1..d104579 100644
--- a/include/linux/pm.h
+++ b/include/linux/pm.h
@@ -538,6 +538,7 @@ struct dev_pm_info {
 	unsigned int		irq_safe:1;
 	unsigned int		use_autosuspend:1;
 	unsigned int		timer_autosuspends:1;
+	unsigned int		memalloc_noio_resume:1;
 	enum rpm_request	request;
 	enum rpm_status		runtime_status;
 	int			runtime_error;
diff --git a/include/linux/pm_runtime.h b/include/linux/pm_runtime.h
index f271860..b522b09 100644
--- a/include/linux/pm_runtime.h
+++ b/include/linux/pm_runtime.h
@@ -47,6 +47,8 @@ extern void pm_runtime_set_autosuspend_delay(struct device *dev, int delay);
 extern unsigned long pm_runtime_autosuspend_expiration(struct device *dev);
 extern void pm_runtime_update_max_time_suspended(struct device *dev,
 						 s64 delta_ns);
+extern bool pm_runtime_get_memalloc_noio(struct device *dev);
+extern void pm_runtime_set_memalloc_noio(struct device *dev, bool enable);
 
 static inline bool pm_children_suspended(struct device *dev)
 {
@@ -149,6 +151,9 @@ static inline void pm_runtime_set_autosuspend_delay(struct device *dev,
 						int delay) {}
 static inline unsigned long pm_runtime_autosuspend_expiration(
 				struct device *dev) { return 0; }
+static inline bool pm_runtime_get_memalloc_noio(struct device *dev) { return false; }
+static inline void pm_runtime_set_memalloc_noio(struct device *dev,
+						bool enable){}
 
 #endif /* !CONFIG_PM_RUNTIME */
 
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 3/6] block/genhd.c: apply pm_runtime_set_memalloc_noio on block devices
  2012-10-29 12:23 [PATCH v3 0/6] solve deadlock caused by memory allocation with I/O Ming Lei
  2012-10-29 12:23 ` [PATCH v3 1/6] mm: teach mm by current context info to not do I/O during memory allocation Ming Lei
  2012-10-29 12:23 ` [PATCH v3 2/6] PM / Runtime: introduce pm_runtime_set[get]_memalloc_noio() Ming Lei
@ 2012-10-29 12:23 ` Ming Lei
  2012-10-29 12:23 ` [PATCH v3 4/6] net/core: apply pm_runtime_set_memalloc_noio on network devices Ming Lei
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 23+ messages in thread
From: Ming Lei @ 2012-10-29 12:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: Alan Stern, Oliver Neukum, Minchan Kim, Greg Kroah-Hartman,
	Rafael J. Wysocki, Jens Axboe, David S. Miller, Andrew Morton,
	netdev, linux-usb, linux-pm, linux-mm, Ming Lei

This patch applyes the introduced pm_runtime_set_memalloc_noio on
block device so that PM core will teach mm to not allocate memory with
GFP_IOFS when calling the runtime_resume callback for block devices.

Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
---
 block/genhd.c |    8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/block/genhd.c b/block/genhd.c
index 9e02cd6..c5f10ea 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -18,6 +18,7 @@
 #include <linux/mutex.h>
 #include <linux/idr.h>
 #include <linux/log2.h>
+#include <linux/pm_runtime.h>
 
 #include "blk.h"
 
@@ -519,6 +520,12 @@ static void register_disk(struct gendisk *disk)
 
 	dev_set_name(ddev, disk->disk_name);
 
+	/* avoid probable deadlock caused by allocate memory with
+	 * GFP_KERNEL in runtime_resume callback of its all ancestor
+	 * deivces
+	 */
+	pm_runtime_set_memalloc_noio(ddev, true);
+
 	/* delay uevents, until we scanned partition table */
 	dev_set_uevent_suppress(ddev, 1);
 
@@ -661,6 +668,7 @@ void del_gendisk(struct gendisk *disk)
 	disk->driverfs_dev = NULL;
 	if (!sysfs_deprecated)
 		sysfs_remove_link(block_depr, dev_name(disk_to_dev(disk)));
+	pm_runtime_set_memalloc_noio(disk_to_dev(disk), false);
 	device_del(disk_to_dev(disk));
 }
 EXPORT_SYMBOL(del_gendisk);
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 4/6] net/core: apply pm_runtime_set_memalloc_noio on network devices
  2012-10-29 12:23 [PATCH v3 0/6] solve deadlock caused by memory allocation with I/O Ming Lei
                   ` (2 preceding siblings ...)
  2012-10-29 12:23 ` [PATCH v3 3/6] block/genhd.c: apply pm_runtime_set_memalloc_noio on block devices Ming Lei
@ 2012-10-29 12:23 ` Ming Lei
  2012-10-29 15:44   ` Alan Stern
  2012-10-29 12:23 ` [PATCH v3 5/6] PM / Runtime: force memory allocation with no I/O during runtime_resume callbcack Ming Lei
  2012-10-29 12:24 ` [PATCH v3 6/6] USB: forbid memory allocation with I/O during bus reset Ming Lei
  5 siblings, 1 reply; 23+ messages in thread
From: Ming Lei @ 2012-10-29 12:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: Alan Stern, Oliver Neukum, Minchan Kim, Greg Kroah-Hartman,
	Rafael J. Wysocki, Jens Axboe, David S. Miller, Andrew Morton,
	netdev, linux-usb, linux-pm, linux-mm, Ming Lei, Eric Dumazet,
	David Decotigny, Tom Herbert, Ingo Molnar

Deadlock might be caused by allocating memory with GFP_KERNEL in
runtime_resume callback of network devices in iSCSI situation, so
mark network devices and its ancestor as 'memalloc_noio_resume'
with the introduced pm_runtime_set_memalloc_noio().

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David Decotigny <david.decotigny@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
---
 net/core/net-sysfs.c |    5 +++++
 1 file changed, 5 insertions(+)

diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index bcf02f6..9aba5be 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -22,6 +22,7 @@
 #include <linux/vmalloc.h>
 #include <linux/export.h>
 #include <linux/jiffies.h>
+#include <linux/pm_runtime.h>
 #include <net/wext.h>
 
 #include "net-sysfs.h"
@@ -1386,6 +1387,8 @@ void netdev_unregister_kobject(struct net_device * net)
 
 	remove_queue_kobjects(net);
 
+	pm_runtime_set_memalloc_noio(dev, false);
+
 	device_del(dev);
 }
 
@@ -1411,6 +1414,8 @@ int netdev_register_kobject(struct net_device *net)
 	*groups++ = &netstat_group;
 #endif /* CONFIG_SYSFS */
 
+	pm_runtime_set_memalloc_noio(dev, true);
+
 	error = device_add(dev);
 	if (error)
 		return error;
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 5/6] PM / Runtime: force memory allocation with no I/O during runtime_resume callbcack
  2012-10-29 12:23 [PATCH v3 0/6] solve deadlock caused by memory allocation with I/O Ming Lei
                   ` (3 preceding siblings ...)
  2012-10-29 12:23 ` [PATCH v3 4/6] net/core: apply pm_runtime_set_memalloc_noio on network devices Ming Lei
@ 2012-10-29 12:23 ` Ming Lei
  2012-10-29 12:24 ` [PATCH v3 6/6] USB: forbid memory allocation with I/O during bus reset Ming Lei
  5 siblings, 0 replies; 23+ messages in thread
From: Ming Lei @ 2012-10-29 12:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: Alan Stern, Oliver Neukum, Minchan Kim, Greg Kroah-Hartman,
	Rafael J. Wysocki, Jens Axboe, David S. Miller, Andrew Morton,
	netdev, linux-usb, linux-pm, linux-mm, Ming Lei

This patch applies the introduced memalloc_noio_save() and
memalloc_noio_restore() to force memory allocation with no I/O
during runtime_resume callback on device which is marked as
memalloc_noio_resume.

Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Oliver Neukum <oneukum@suse.de>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
---
 drivers/base/power/runtime.c |   16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
index 9fa6ea7..c9e26b9 100644
--- a/drivers/base/power/runtime.c
+++ b/drivers/base/power/runtime.c
@@ -575,6 +575,7 @@ static int rpm_resume(struct device *dev, int rpmflags)
 	int (*callback)(struct device *);
 	struct device *parent = NULL;
 	int retval = 0;
+	unsigned int noio_flag;
 
 	trace_rpm_resume(dev, rpmflags);
 
@@ -724,7 +725,20 @@ static int rpm_resume(struct device *dev, int rpmflags)
 	if (!callback && dev->driver && dev->driver->pm)
 		callback = dev->driver->pm->runtime_resume;
 
-	retval = rpm_callback(callback, dev);
+	/*
+	 * Deadlock might be caused if memory allocation with GFP_KERNEL
+	 * happens inside runtime_resume callback of one block device's
+	 * ancestor or the block device itself. Network device might be
+	 * thought as part of iSCSI block device, so network device and
+	 * its ancestor should be marked as memalloc_noio_resume.
+	 */
+	if (dev->power.memalloc_noio_resume) {
+		memalloc_noio_save(noio_flag);
+		retval = rpm_callback(callback, dev);
+		memalloc_noio_restore(noio_flag);
+	} else {
+		retval = rpm_callback(callback, dev);
+	}
 	if (retval) {
 		__update_runtime_status(dev, RPM_SUSPENDED);
 		pm_runtime_cancel_pending(dev);
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 6/6] USB: forbid memory allocation with I/O during bus reset
  2012-10-29 12:23 [PATCH v3 0/6] solve deadlock caused by memory allocation with I/O Ming Lei
                   ` (4 preceding siblings ...)
  2012-10-29 12:23 ` [PATCH v3 5/6] PM / Runtime: force memory allocation with no I/O during runtime_resume callbcack Ming Lei
@ 2012-10-29 12:24 ` Ming Lei
  5 siblings, 0 replies; 23+ messages in thread
From: Ming Lei @ 2012-10-29 12:24 UTC (permalink / raw)
  To: linux-kernel
  Cc: Alan Stern, Oliver Neukum, Minchan Kim, Greg Kroah-Hartman,
	Rafael J. Wysocki, Jens Axboe, David S. Miller, Andrew Morton,
	netdev, linux-usb, linux-pm, linux-mm, Ming Lei

If one storage interface or usb network interface(iSCSI case)
exists in current configuration, memory allocation with
GFP_KERNEL during usb_device_reset() might trigger I/O transfer
on the storage interface itself and cause deadlock because
the 'us->dev_mutex' is held in .pre_reset() and the storage
interface can't do I/O transfer when the reset is triggered
by other interface, or the error handling can't be completed
if the reset is triggered by the storage itself(error handling path).

Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Oliver Neukum <oneukum@suse.de>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
---
v3:
	- check usbnet device or usb mass storage device by
	'dev->power.memalloc_noio_resume' as suggested by Alan Stern
---
 drivers/usb/core/hub.c |   15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
index 5b131b6..5aea807 100644
--- a/drivers/usb/core/hub.c
+++ b/drivers/usb/core/hub.c
@@ -5044,6 +5044,8 @@ int usb_reset_device(struct usb_device *udev)
 {
 	int ret;
 	int i;
+	unsigned int uninitialized_var(noio_flag);
+	bool noio_set = false;
 	struct usb_host_config *config = udev->actconfig;
 
 	if (udev->state == USB_STATE_NOTATTACHED ||
@@ -5053,6 +5055,17 @@ int usb_reset_device(struct usb_device *udev)
 		return -EINVAL;
 	}
 
+	/*
+	 * Don't allocate memory with GFP_KERNEL in current
+	 * context to avoid possible deadlock if usb mass
+	 * storage interface or usbnet interface(iSCSI case)
+	 * is included in current configuration.
+	 */
+	if (pm_runtime_get_memalloc_noio(&udev->dev)) {
+		memalloc_noio_save(noio_flag);
+		noio_set = true;
+	}
+
 	/* Prevent autosuspend during the reset */
 	usb_autoresume_device(udev);
 
@@ -5097,6 +5110,8 @@ int usb_reset_device(struct usb_device *udev)
 	}
 
 	usb_autosuspend_device(udev);
+	if (noio_set)
+		memalloc_noio_restore(noio_flag);
 	return ret;
 }
 EXPORT_SYMBOL_GPL(usb_reset_device);
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 2/6] PM / Runtime: introduce pm_runtime_set[get]_memalloc_noio()
  2012-10-29 12:23 ` [PATCH v3 2/6] PM / Runtime: introduce pm_runtime_set[get]_memalloc_noio() Ming Lei
@ 2012-10-29 15:41   ` Alan Stern
  2012-10-30  3:21     ` Ming Lei
  0 siblings, 1 reply; 23+ messages in thread
From: Alan Stern @ 2012-10-29 15:41 UTC (permalink / raw)
  To: Ming Lei
  Cc: linux-kernel, Oliver Neukum, Minchan Kim, Greg Kroah-Hartman,
	Rafael J. Wysocki, Jens Axboe, David S. Miller, Andrew Morton,
	netdev, linux-usb, linux-pm, linux-mm

On Mon, 29 Oct 2012, Ming Lei wrote:

> The patch introduces the flag of memalloc_noio_resume in
> 'struct dev_pm_info' to help PM core to teach mm not allocating
> memory with GFP_KERNEL flag for avoiding probable deadlock
> problem.
> 
> As explained in the comment, any GFP_KERNEL allocation inside
> runtime_resume on any one of device in the path from one block
> or network device to the root device in the device tree may cause
> deadlock, the introduced pm_runtime_set_memalloc_noio() sets or
> clears the flag on device of the path recursively.
> 
> This patch also introduces pm_runtime_get_memalloc_noio() because
> the flag may be accessed in block device's error handling path
> (for example, usb device reset)

> +/*
> + * pm_runtime_get_memalloc_noio - Get a device's memalloc_noio flag.
> + * @dev: Device to handle.
> + *
> + * Return the device's memalloc_noio flag.
> + *
> + * The device power lock is held because bitfield is not SMP-safe.
> + */
> +bool pm_runtime_get_memalloc_noio(struct device *dev)
> +{
> +	bool ret;
> +	spin_lock_irq(&dev->power.lock);
> +	ret = dev->power.memalloc_noio_resume;
> +	spin_unlock_irq(&dev->power.lock);
> +	return ret;
> +}

You don't need to acquire and release a spinlock just to read the
value.  Reading bitfields _is_ SMP-safe; writing them is not.

> +/*
> + * pm_runtime_set_memalloc_noio - Set a device's memalloc_noio flag.
> + * @dev: Device to handle.
> + * @enable: True for setting the flag and False for clearing the flag.
> + *
> + * Set the flag for all devices in the path from the device to the
> + * root device in the device tree if @enable is true, otherwise clear
> + * the flag for devices in the path which sibliings don't set the flag.

s/which/whose/
s/ii/i

> + *
> + * The function should only be called by block device, or network
> + * device driver for solving the deadlock problem during runtime
> + * resume:
> + * 	if memory allocation with GFP_KERNEL is called inside runtime
> + * 	resume callback of any one of its ancestors(or the block device
> + * 	itself), the deadlock may be triggered inside the memory
> + * 	allocation since it might not complete until the block device
> + * 	becomes active and the involed page I/O finishes. The situation
> + * 	is pointed out first by Alan Stern. Network device are involved
> + * 	in iSCSI kind of situation.
> + *
> + * The lock of dev_hotplug_mutex is held in the function for handling
> + * hotplug race because pm_runtime_set_memalloc_noio() may be called
> + * in async probe().
> + */
> +void pm_runtime_set_memalloc_noio(struct device *dev, bool enable)
> +{
> +	static DEFINE_MUTEX(dev_hotplug_mutex);
> +
> +	mutex_lock(&dev_hotplug_mutex);
> +	while (dev) {

Unless you think somebody is likely to call this function with dev 
equal to NULL, this can simply be

	for (;;) {

> +		/* hold power lock since bitfield is not SMP-safe. */
> +		spin_lock_irq(&dev->power.lock);
> +		dev->power.memalloc_noio_resume = enable;
> +		spin_unlock_irq(&dev->power.lock);
> +
> +		dev = dev->parent;
> +
> +		/* only clear the flag for one device if all
> +		 * children of the device don't set the flag.
> +		 */
> +		if (!dev || (!enable &&

... thanks to this test.

> +			     device_for_each_child(dev, NULL,
> +						   dev_memalloc_noio)))
> +			break;
> +	}
> +	mutex_unlock(&dev_hotplug_mutex);
> +}

This might not work if somebody calls pm_runtime_set_memalloc_noio(dev,
true) and then afterwards registers dev at the same time as someone
else calls pm_runtime_set_memalloc_noio(dev2, false), if dev and dev2
have the same parent.

Perhaps the kerneldoc should mention that this function must not be 
called until after dev is registered.

Alan Stern


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 4/6] net/core: apply pm_runtime_set_memalloc_noio on network devices
  2012-10-29 12:23 ` [PATCH v3 4/6] net/core: apply pm_runtime_set_memalloc_noio on network devices Ming Lei
@ 2012-10-29 15:44   ` Alan Stern
  0 siblings, 0 replies; 23+ messages in thread
From: Alan Stern @ 2012-10-29 15:44 UTC (permalink / raw)
  To: Ming Lei
  Cc: linux-kernel, Oliver Neukum, Minchan Kim, Greg Kroah-Hartman,
	Rafael J. Wysocki, Jens Axboe, David S. Miller, Andrew Morton,
	netdev, linux-usb, linux-pm, linux-mm, Eric Dumazet,
	David Decotigny, Tom Herbert, Ingo Molnar

On Mon, 29 Oct 2012, Ming Lei wrote:

> Deadlock might be caused by allocating memory with GFP_KERNEL in
> runtime_resume callback of network devices in iSCSI situation, so
> mark network devices and its ancestor as 'memalloc_noio_resume'
> with the introduced pm_runtime_set_memalloc_noio().

> @@ -1411,6 +1414,8 @@ int netdev_register_kobject(struct net_device *net)
>  	*groups++ = &netstat_group;
>  #endif /* CONFIG_SYSFS */
>  
> +	pm_runtime_set_memalloc_noio(dev, true);
> +
>  	error = device_add(dev);
>  	if (error)
>  		return error;

This is an example of what I described earlier.  The 
pm_runtime_set_memalloc_noio() call should come after device_add(), not 
before.

(Not to mention that this version of the code doesn't correctly handle
the case where device_add() fails.)

Alan Stern


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 2/6] PM / Runtime: introduce pm_runtime_set[get]_memalloc_noio()
  2012-10-29 15:41   ` Alan Stern
@ 2012-10-30  3:21     ` Ming Lei
  2012-10-30 10:57       ` Oliver Neukum
  2012-10-30 15:38       ` Alan Stern
  0 siblings, 2 replies; 23+ messages in thread
From: Ming Lei @ 2012-10-30  3:21 UTC (permalink / raw)
  To: Alan Stern
  Cc: linux-kernel, Oliver Neukum, Minchan Kim, Greg Kroah-Hartman,
	Rafael J. Wysocki, Jens Axboe, David S. Miller, Andrew Morton,
	netdev, linux-usb, linux-pm, linux-mm

On Mon, Oct 29, 2012 at 11:41 PM, Alan Stern <stern@rowland.harvard.edu> wrote:
> On Mon, 29 Oct 2012, Ming Lei wrote:
>
>> The patch introduces the flag of memalloc_noio_resume in
>> 'struct dev_pm_info' to help PM core to teach mm not allocating
>> memory with GFP_KERNEL flag for avoiding probable deadlock
>> problem.
>>
>> As explained in the comment, any GFP_KERNEL allocation inside
>> runtime_resume on any one of device in the path from one block
>> or network device to the root device in the device tree may cause
>> deadlock, the introduced pm_runtime_set_memalloc_noio() sets or
>> clears the flag on device of the path recursively.
>>
>> This patch also introduces pm_runtime_get_memalloc_noio() because
>> the flag may be accessed in block device's error handling path
>> (for example, usb device reset)
>
>> +/*
>> + * pm_runtime_get_memalloc_noio - Get a device's memalloc_noio flag.
>> + * @dev: Device to handle.
>> + *
>> + * Return the device's memalloc_noio flag.
>> + *
>> + * The device power lock is held because bitfield is not SMP-safe.
>> + */
>> +bool pm_runtime_get_memalloc_noio(struct device *dev)
>> +{
>> +     bool ret;
>> +     spin_lock_irq(&dev->power.lock);
>> +     ret = dev->power.memalloc_noio_resume;
>> +     spin_unlock_irq(&dev->power.lock);
>> +     return ret;
>> +}
>
> You don't need to acquire and release a spinlock just to read the
> value.  Reading bitfields _is_ SMP-safe; writing them is not.

Thanks for your review.

As you pointed out before, the flag need to be checked before
resetting usb devices, so the lock should be held to make another
context(CPU) see the updated value suppose one context(CPU)
call pm_runtime_set_memalloc_noio() to change the flag at the
same time.

The lock needn't to be held when the function is called inside
pm_runtime_set_memalloc_noio(),  so the bitfield flag should
be checked directly without holding power lock in dev_memalloc_noio().

>
>> +/*
>> + * pm_runtime_set_memalloc_noio - Set a device's memalloc_noio flag.
>> + * @dev: Device to handle.
>> + * @enable: True for setting the flag and False for clearing the flag.
>> + *
>> + * Set the flag for all devices in the path from the device to the
>> + * root device in the device tree if @enable is true, otherwise clear
>> + * the flag for devices in the path which sibliings don't set the flag.
>
> s/which/whose/
> s/ii/i

Will fix it in -v4.

>> + *
>> + * The function should only be called by block device, or network
>> + * device driver for solving the deadlock problem during runtime
>> + * resume:
>> + *   if memory allocation with GFP_KERNEL is called inside runtime
>> + *   resume callback of any one of its ancestors(or the block device
>> + *   itself), the deadlock may be triggered inside the memory
>> + *   allocation since it might not complete until the block device
>> + *   becomes active and the involed page I/O finishes. The situation
>> + *   is pointed out first by Alan Stern. Network device are involved
>> + *   in iSCSI kind of situation.
>> + *
>> + * The lock of dev_hotplug_mutex is held in the function for handling
>> + * hotplug race because pm_runtime_set_memalloc_noio() may be called
>> + * in async probe().
>> + */
>> +void pm_runtime_set_memalloc_noio(struct device *dev, bool enable)
>> +{
>> +     static DEFINE_MUTEX(dev_hotplug_mutex);
>> +
>> +     mutex_lock(&dev_hotplug_mutex);
>> +     while (dev) {
>
> Unless you think somebody is likely to call this function with dev
> equal to NULL, this can simply be
>
>         for (;;) {
>
>> +             /* hold power lock since bitfield is not SMP-safe. */
>> +             spin_lock_irq(&dev->power.lock);
>> +             dev->power.memalloc_noio_resume = enable;
>> +             spin_unlock_irq(&dev->power.lock);
>> +
>> +             dev = dev->parent;
>> +
>> +             /* only clear the flag for one device if all
>> +              * children of the device don't set the flag.
>> +              */
>> +             if (!dev || (!enable &&
>
> ... thanks to this test.
>
>> +                          device_for_each_child(dev, NULL,
>> +                                                dev_memalloc_noio)))
>> +                     break;
>> +     }
>> +     mutex_unlock(&dev_hotplug_mutex);
>> +}
>
> This might not work if somebody calls pm_runtime_set_memalloc_noio(dev,
> true) and then afterwards registers dev at the same time as someone
> else calls pm_runtime_set_memalloc_noio(dev2, false), if dev and dev2
> have the same parent.

Good catch, pm_runtime_set_memalloc_noio() should be called between
device_add() and device_del() on block/network device.

> Perhaps the kerneldoc should mention that this function must not be
> called until after dev is registered.

Yes,  it should be added in -v4.


Thanks,
--
Ming Lei

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 2/6] PM / Runtime: introduce pm_runtime_set[get]_memalloc_noio()
  2012-10-30  3:21     ` Ming Lei
@ 2012-10-30 10:57       ` Oliver Neukum
  2012-10-30 11:30         ` Ming Lei
  2012-10-30 15:38       ` Alan Stern
  1 sibling, 1 reply; 23+ messages in thread
From: Oliver Neukum @ 2012-10-30 10:57 UTC (permalink / raw)
  To: Ming Lei
  Cc: Alan Stern, linux-kernel, Minchan Kim, Greg Kroah-Hartman,
	Rafael J. Wysocki, Jens Axboe, David S. Miller, Andrew Morton,
	netdev, linux-usb, linux-pm, linux-mm

On Tuesday 30 October 2012 11:21:33 Ming Lei wrote:
> On Mon, Oct 29, 2012 at 11:41 PM, Alan Stern <stern@rowland.harvard.edu> wrote:
> > On Mon, 29 Oct 2012, Ming Lei wrote:
> >
> >> The patch introduces the flag of memalloc_noio_resume in
> >> 'struct dev_pm_info' to help PM core to teach mm not allocating
> >> memory with GFP_KERNEL flag for avoiding probable deadlock
> >> problem.
> >>
> >> As explained in the comment, any GFP_KERNEL allocation inside
> >> runtime_resume on any one of device in the path from one block
> >> or network device to the root device in the device tree may cause
> >> deadlock, the introduced pm_runtime_set_memalloc_noio() sets or
> >> clears the flag on device of the path recursively.
> >>
> >> This patch also introduces pm_runtime_get_memalloc_noio() because
> >> the flag may be accessed in block device's error handling path
> >> (for example, usb device reset)
> >
> >> +/*
> >> + * pm_runtime_get_memalloc_noio - Get a device's memalloc_noio flag.
> >> + * @dev: Device to handle.
> >> + *
> >> + * Return the device's memalloc_noio flag.
> >> + *
> >> + * The device power lock is held because bitfield is not SMP-safe.
> >> + */
> >> +bool pm_runtime_get_memalloc_noio(struct device *dev)
> >> +{
> >> +     bool ret;
> >> +     spin_lock_irq(&dev->power.lock);
> >> +     ret = dev->power.memalloc_noio_resume;
> >> +     spin_unlock_irq(&dev->power.lock);
> >> +     return ret;
> >> +}
> >
> > You don't need to acquire and release a spinlock just to read the
> > value.  Reading bitfields _is_ SMP-safe; writing them is not.
> 
> Thanks for your review.
> 
> As you pointed out before, the flag need to be checked before
> resetting usb devices, so the lock should be held to make another
> context(CPU) see the updated value suppose one context(CPU)
> call pm_runtime_set_memalloc_noio() to change the flag at the
> same time.
> 
> The lock needn't to be held when the function is called inside
> pm_runtime_set_memalloc_noio(),  so the bitfield flag should
> be checked directly without holding power lock in dev_memalloc_noio().

Hi,

how is this to work with power management domains?
And I may be dense, but disks are added in slave_configure().
This seems to be a race to me.

	Regards
		Oliver


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 2/6] PM / Runtime: introduce pm_runtime_set[get]_memalloc_noio()
  2012-10-30 10:57       ` Oliver Neukum
@ 2012-10-30 11:30         ` Ming Lei
  0 siblings, 0 replies; 23+ messages in thread
From: Ming Lei @ 2012-10-30 11:30 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: Alan Stern, linux-kernel, Minchan Kim, Greg Kroah-Hartman,
	Rafael J. Wysocki, Jens Axboe, David S. Miller, Andrew Morton,
	netdev, linux-usb, linux-pm, linux-mm

Hi,

On Tue, Oct 30, 2012 at 6:57 PM, Oliver Neukum <oneukum@suse.de> wrote:
> how is this to work with power management domains?

Could you explain it in a bit detail? Why is PM domain involved?

Suppose PM domain is involved, its domain runtime_resume callback
is still run in the context with PF_MEMALLOC_NOIO flag set if the
affected 'device' is passed to the callback.

> And I may be dense, but disks are added in slave_configure().
> This seems to be a race to me.

Sorry, could you describe what is the race?

Suppose drivers set correct parent device to the disk device(gendisk),
then add the disk into device model via register_disk(), the solution
should be fine.

Thanks,
--
Ming Lei

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 2/6] PM / Runtime: introduce pm_runtime_set[get]_memalloc_noio()
  2012-10-30  3:21     ` Ming Lei
  2012-10-30 10:57       ` Oliver Neukum
@ 2012-10-30 15:38       ` Alan Stern
  2012-10-30 16:00         ` Ming Lei
  1 sibling, 1 reply; 23+ messages in thread
From: Alan Stern @ 2012-10-30 15:38 UTC (permalink / raw)
  To: Ming Lei
  Cc: linux-kernel, Oliver Neukum, Minchan Kim, Greg Kroah-Hartman,
	Rafael J. Wysocki, Jens Axboe, David S. Miller, Andrew Morton,
	netdev, linux-usb, linux-pm, linux-mm

On Tue, 30 Oct 2012, Ming Lei wrote:

> >> +bool pm_runtime_get_memalloc_noio(struct device *dev)
> >> +{
> >> +     bool ret;
> >> +     spin_lock_irq(&dev->power.lock);
> >> +     ret = dev->power.memalloc_noio_resume;
> >> +     spin_unlock_irq(&dev->power.lock);
> >> +     return ret;
> >> +}
> >
> > You don't need to acquire and release a spinlock just to read the
> > value.  Reading bitfields _is_ SMP-safe; writing them is not.
> 
> Thanks for your review.
> 
> As you pointed out before, the flag need to be checked before
> resetting usb devices, so the lock should be held to make another
> context(CPU) see the updated value suppose one context(CPU)
> call pm_runtime_set_memalloc_noio() to change the flag at the
> same time.

Okay, I see your point.  But acquiring the lock here doesn't solve the 
problem.  Suppose a thread is about to reset a USB mass-storage device.  
It acquires the lock and sees that the noio flag is clear.  But before 
it can issue the reset, another thread sets the noio flag.

I'm not sure what the best solution is.

> The lock needn't to be held when the function is called inside
> pm_runtime_set_memalloc_noio(),  so the bitfield flag should
> be checked directly without holding power lock in dev_memalloc_noio().

Yes.

A couple of other things...  Runtime resume can be blocked by runtime
suspend, if a resume is requested while the suspend is in progress.  
Therefore the runtime suspend code also needs to save-set-restore the
noio flag.

Also, we should set the noio flag at the start of 
usb_stor_control_thread, because everything that thread does can 
potentially block an I/O operation.

Lastly, pm_runtime_get_memalloc_noio always returns false when 
CONFIG_PM_RUNTIME is disabled.  But we still need to prevent I/O during 
usb_reset_device even when there's no runtime PM.  Maybe the simplest 
answer is always to set noio during resets.  That would also help with 
the race described above.

Alan Stern


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 2/6] PM / Runtime: introduce pm_runtime_set[get]_memalloc_noio()
  2012-10-30 15:38       ` Alan Stern
@ 2012-10-30 16:00         ` Ming Lei
  2012-10-30 16:15           ` Ming Lei
                             ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: Ming Lei @ 2012-10-30 16:00 UTC (permalink / raw)
  To: Alan Stern
  Cc: linux-kernel, Oliver Neukum, Minchan Kim, Greg Kroah-Hartman,
	Rafael J. Wysocki, Jens Axboe, David S. Miller, Andrew Morton,
	netdev, linux-usb, linux-pm, linux-mm

On Tue, Oct 30, 2012 at 11:38 PM, Alan Stern <stern@rowland.harvard.edu> wrote:

>
> Okay, I see your point.  But acquiring the lock here doesn't solve the
> problem.  Suppose a thread is about to reset a USB mass-storage device.
> It acquires the lock and sees that the noio flag is clear.  But before
> it can issue the reset, another thread sets the noio flag.

If the USB mass-storage device is being reseted, the flag should be set
already generally.  If the flag is still unset, that means the disk/network
device isn't added into system(or removed just now), so memory allocation
with block I/O should be allowed during the reset. Looks it isn't one problem,
isn't it?

> I'm not sure what the best solution is.
>
>> The lock needn't to be held when the function is called inside
>> pm_runtime_set_memalloc_noio(),  so the bitfield flag should
>> be checked directly without holding power lock in dev_memalloc_noio().
>
> Yes.
>
> A couple of other things...  Runtime resume can be blocked by runtime
> suspend, if a resume is requested while the suspend is in progress.
> Therefore the runtime suspend code also needs to save-set-restore the
> noio flag.

Looks the simplest approach is to handle the noio flag thing at the start and
end of rpm_resume.

> Also, we should set the noio flag at the start of
> usb_stor_control_thread, because everything that thread does can
> potentially block an I/O operation.

Yes, it should be done, and all GFP_NOIO in usbcore should be converted
into GFP_KERNEL together. And the work shouldn't be started until
the patchset is merged.

> Lastly, pm_runtime_get_memalloc_noio always returns false when
> CONFIG_PM_RUNTIME is disabled.  But we still need to prevent I/O during
> usb_reset_device even when there's no runtime PM.  Maybe the simplest
> answer is always to set noio during resets.  That would also help with
> the race described above.

I have thought about this. IMO, pm_runtime_get_memalloc_noio should
return true always if CONFIG_PM_RUNTIME is unset.

Thanks,
--
Ming Lei

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 2/6] PM / Runtime: introduce pm_runtime_set[get]_memalloc_noio()
  2012-10-30 16:00         ` Ming Lei
@ 2012-10-30 16:15           ` Ming Lei
  2012-10-30 16:30           ` Oliver Neukum
  2012-10-30 16:54           ` Alan Stern
  2 siblings, 0 replies; 23+ messages in thread
From: Ming Lei @ 2012-10-30 16:15 UTC (permalink / raw)
  To: Alan Stern
  Cc: linux-kernel, Oliver Neukum, Minchan Kim, Greg Kroah-Hartman,
	Rafael J. Wysocki, Jens Axboe, David S. Miller, Andrew Morton,
	netdev, linux-usb, linux-pm, linux-mm

On Wed, Oct 31, 2012 at 12:00 AM, Ming Lei <ming.lei@canonical.com> wrote:
>
> Looks the simplest approach is to handle the noio flag thing at the start and
> end of rpm_resume.

Sorry, that doesn't work, runtime_suspend need that too because memory
allocation with block I/O might deadlock when doing I/O on the same device.

Thanks,
--
Ming Lei

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 2/6] PM / Runtime: introduce pm_runtime_set[get]_memalloc_noio()
  2012-10-30 16:00         ` Ming Lei
  2012-10-30 16:15           ` Ming Lei
@ 2012-10-30 16:30           ` Oliver Neukum
  2012-10-31  2:08             ` Ming Lei
  2012-10-30 16:54           ` Alan Stern
  2 siblings, 1 reply; 23+ messages in thread
From: Oliver Neukum @ 2012-10-30 16:30 UTC (permalink / raw)
  To: Ming Lei
  Cc: Alan Stern, linux-kernel, Minchan Kim, Greg Kroah-Hartman,
	Rafael J. Wysocki, Jens Axboe, David S. Miller, Andrew Morton,
	netdev, linux-usb, linux-pm, linux-mm

On Wednesday 31 October 2012 00:00:56 Ming Lei wrote:
> On Tue, Oct 30, 2012 at 11:38 PM, Alan Stern <stern@rowland.harvard.edu> wrote:
> 
> >
> > Okay, I see your point.  But acquiring the lock here doesn't solve the
> > problem.  Suppose a thread is about to reset a USB mass-storage device.
> > It acquires the lock and sees that the noio flag is clear.  But before
> > it can issue the reset, another thread sets the noio flag.
> 
> If the USB mass-storage device is being reseted, the flag should be set
> already generally.  If the flag is still unset, that means the disk/network
> device isn't added into system(or removed just now), so memory allocation
> with block I/O should be allowed during the reset. Looks it isn't one problem,
> isn't it?

I am afraid it is, because a disk may just have been probed as the deviceis being reset.

	Regards
		Oliver


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 2/6] PM / Runtime: introduce pm_runtime_set[get]_memalloc_noio()
  2012-10-30 16:00         ` Ming Lei
  2012-10-30 16:15           ` Ming Lei
  2012-10-30 16:30           ` Oliver Neukum
@ 2012-10-30 16:54           ` Alan Stern
  2 siblings, 0 replies; 23+ messages in thread
From: Alan Stern @ 2012-10-30 16:54 UTC (permalink / raw)
  To: Ming Lei
  Cc: linux-kernel, Oliver Neukum, Minchan Kim, Greg Kroah-Hartman,
	Rafael J. Wysocki, Jens Axboe, David S. Miller, Andrew Morton,
	netdev, linux-usb, linux-pm, linux-mm

On Wed, 31 Oct 2012, Ming Lei wrote:

> On Tue, Oct 30, 2012 at 11:38 PM, Alan Stern <stern@rowland.harvard.edu> wrote:
> 
> >
> > Okay, I see your point.  But acquiring the lock here doesn't solve the
> > problem.  Suppose a thread is about to reset a USB mass-storage device.
> > It acquires the lock and sees that the noio flag is clear.  But before
> > it can issue the reset, another thread sets the noio flag.
> 
> If the USB mass-storage device is being reseted, the flag should be set
> already generally.  If the flag is still unset, that means the disk/network
> device isn't added into system(or removed just now), so memory allocation
> with block I/O should be allowed during the reset. Looks it isn't one problem,
> isn't it?

As Oliver said, it can be a problem.

> > Lastly, pm_runtime_get_memalloc_noio always returns false when
> > CONFIG_PM_RUNTIME is disabled.  But we still need to prevent I/O during
> > usb_reset_device even when there's no runtime PM.  Maybe the simplest
> > answer is always to set noio during resets.  That would also help with
> > the race described above.
> 
> I have thought about this. IMO, pm_runtime_get_memalloc_noio should
> return true always if CONFIG_PM_RUNTIME is unset.

That's okay as long as the only user of pm_runtime_get_memalloc_noio
(apart from the runtime PM core) is usbcore.

Alan Stern


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 2/6] PM / Runtime: introduce pm_runtime_set[get]_memalloc_noio()
  2012-10-30 16:30           ` Oliver Neukum
@ 2012-10-31  2:08             ` Ming Lei
  2012-10-31  3:05               ` Ming Lei
  0 siblings, 1 reply; 23+ messages in thread
From: Ming Lei @ 2012-10-31  2:08 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: Alan Stern, linux-kernel, Minchan Kim, Greg Kroah-Hartman,
	Rafael J. Wysocki, Jens Axboe, David S. Miller, Andrew Morton,
	netdev, linux-usb, linux-pm, linux-mm

On Wed, Oct 31, 2012 at 12:30 AM, Oliver Neukum <oneukum@suse.de> wrote:
>> If the USB mass-storage device is being reseted, the flag should be set
>> already generally.  If the flag is still unset, that means the disk/network
>> device isn't added into system(or removed just now), so memory allocation
>> with block I/O should be allowed during the reset. Looks it isn't one problem,
>> isn't it?
>
> I am afraid it is, because a disk may just have been probed as the deviceis being reset.

Yes, it is probable, and sounds like similar with 'root_wait' problem, see
prepare_namespace(): init/do_mounts.c, so looks no good solution
for the problem, and maybe we have to set the flag always before resetting
usb device.


Thanks,
--
Ming Lei

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 2/6] PM / Runtime: introduce pm_runtime_set[get]_memalloc_noio()
  2012-10-31  2:08             ` Ming Lei
@ 2012-10-31  3:05               ` Ming Lei
  2012-10-31  8:37                 ` Oliver Neukum
  2012-10-31 15:20                 ` Alan Stern
  0 siblings, 2 replies; 23+ messages in thread
From: Ming Lei @ 2012-10-31  3:05 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: Alan Stern, linux-kernel, Minchan Kim, Greg Kroah-Hartman,
	Rafael J. Wysocki, Jens Axboe, David S. Miller, Andrew Morton,
	netdev, linux-usb, linux-pm, linux-mm

On Wed, Oct 31, 2012 at 10:08 AM, Ming Lei <ming.lei@canonical.com> wrote:
>> I am afraid it is, because a disk may just have been probed as the deviceis being reset.
>
> Yes, it is probable, and sounds like similar with 'root_wait' problem, see
> prepare_namespace(): init/do_mounts.c, so looks no good solution
> for the problem, and maybe we have to set the flag always before resetting
> usb device.

The below idea may help the problem which 'memalloc_noio' flag isn't set during
usb_reset_device().

- for usb mass storage device, call pm_runtime_set_memalloc_noio(true)
  inside usb_stor_probe2() and uas_probe(), and call
  pm_runtime_set_memalloc_noio(false) inside uas_disconnect()
  and usb_stor_disconnect().

- for usb network device, register_netdev() is always called inside usb
  interface's probe(),  looks no such problem.

Thanks,
--
Ming Lei

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 2/6] PM / Runtime: introduce pm_runtime_set[get]_memalloc_noio()
  2012-10-31  3:05               ` Ming Lei
@ 2012-10-31  8:37                 ` Oliver Neukum
  2012-10-31 15:20                 ` Alan Stern
  1 sibling, 0 replies; 23+ messages in thread
From: Oliver Neukum @ 2012-10-31  8:37 UTC (permalink / raw)
  To: Ming Lei
  Cc: Alan Stern, linux-kernel, Minchan Kim, Greg Kroah-Hartman,
	Rafael J. Wysocki, Jens Axboe, David S. Miller, Andrew Morton,
	netdev, linux-usb, linux-pm, linux-mm

On Wednesday 31 October 2012 11:05:33 Ming Lei wrote:
> On Wed, Oct 31, 2012 at 10:08 AM, Ming Lei <ming.lei@canonical.com> wrote:
> >> I am afraid it is, because a disk may just have been probed as the deviceis being reset.
> >
> > Yes, it is probable, and sounds like similar with 'root_wait' problem, see
> > prepare_namespace(): init/do_mounts.c, so looks no good solution
> > for the problem, and maybe we have to set the flag always before resetting
> > usb device.
> 
> The below idea may help the problem which 'memalloc_noio' flag isn't set during
> usb_reset_device().
> 
> - for usb mass storage device, call pm_runtime_set_memalloc_noio(true)
>   inside usb_stor_probe2() and uas_probe(), and call
>   pm_runtime_set_memalloc_noio(false) inside uas_disconnect()
>   and usb_stor_disconnect().
> 
> - for usb network device, register_netdev() is always called inside usb
>   interface's probe(),  looks no such problem.

This still leaves networking done over PPP in the cold.

	Regards
		Oliver


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 2/6] PM / Runtime: introduce pm_runtime_set[get]_memalloc_noio()
  2012-10-31  3:05               ` Ming Lei
  2012-10-31  8:37                 ` Oliver Neukum
@ 2012-10-31 15:20                 ` Alan Stern
  2012-10-31 15:41                   ` Alan Stern
  1 sibling, 1 reply; 23+ messages in thread
From: Alan Stern @ 2012-10-31 15:20 UTC (permalink / raw)
  To: Ming Lei
  Cc: Oliver Neukum, linux-kernel, Minchan Kim, Greg Kroah-Hartman,
	Rafael J. Wysocki, Jens Axboe, David S. Miller, Andrew Morton,
	netdev, linux-usb, linux-pm, linux-mm

On Wed, 31 Oct 2012, Ming Lei wrote:

> The below idea may help the problem which 'memalloc_noio' flag isn't set during
> usb_reset_device().
> 
> - for usb mass storage device, call pm_runtime_set_memalloc_noio(true)
>   inside usb_stor_probe2() and uas_probe(), and call
>   pm_runtime_set_memalloc_noio(false) inside uas_disconnect()
>   and usb_stor_disconnect().

Why would you want to do that?  The probe and disconnect routines
usually -- but not always -- run in the khubd thread.  Surely you don't
want to prevent khubd from using GFP_KERNEL?

And what if probe runs in khubd but disconnect runs in a different 
thread?

Alan Stern


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 2/6] PM / Runtime: introduce pm_runtime_set[get]_memalloc_noio()
  2012-10-31 15:20                 ` Alan Stern
@ 2012-10-31 15:41                   ` Alan Stern
  2012-10-31 23:18                     ` Ming Lei
  0 siblings, 1 reply; 23+ messages in thread
From: Alan Stern @ 2012-10-31 15:41 UTC (permalink / raw)
  To: Ming Lei
  Cc: Oliver Neukum, linux-kernel, Minchan Kim, Greg Kroah-Hartman,
	Rafael J. Wysocki, Jens Axboe, David S. Miller, Andrew Morton,
	netdev, linux-usb, linux-pm, linux-mm

On Wed, 31 Oct 2012, Alan Stern wrote:

> On Wed, 31 Oct 2012, Ming Lei wrote:
> 
> > The below idea may help the problem which 'memalloc_noio' flag isn't set during
> > usb_reset_device().
> > 
> > - for usb mass storage device, call pm_runtime_set_memalloc_noio(true)
> >   inside usb_stor_probe2() and uas_probe(), and call
> >   pm_runtime_set_memalloc_noio(false) inside uas_disconnect()
> >   and usb_stor_disconnect().
> 
> Why would you want to do that?  The probe and disconnect routines
> usually -- but not always -- run in the khubd thread.  Surely you don't
> want to prevent khubd from using GFP_KERNEL?
> 
> And what if probe runs in khubd but disconnect runs in a different 
> thread?

Sorry, I misread your message.  You are setting the device's flag, not 
the thread's flag.

This still doesn't help in this case where CONFIG_PM_RUNTIME is 
disabled.  I think it will be simpler to set the noio flag during every 
device reset.

Alan Stern


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 2/6] PM / Runtime: introduce pm_runtime_set[get]_memalloc_noio()
  2012-10-31 15:41                   ` Alan Stern
@ 2012-10-31 23:18                     ` Ming Lei
  0 siblings, 0 replies; 23+ messages in thread
From: Ming Lei @ 2012-10-31 23:18 UTC (permalink / raw)
  To: Alan Stern
  Cc: Oliver Neukum, linux-kernel, Minchan Kim, Greg Kroah-Hartman,
	Rafael J. Wysocki, Jens Axboe, David S. Miller, Andrew Morton,
	netdev, linux-usb, linux-pm, linux-mm

On Wed, Oct 31, 2012 at 11:41 PM, Alan Stern <stern@rowland.harvard.edu> wrote:
>
> Sorry, I misread your message.  You are setting the device's flag, not
> the thread's flag.

Never mind.

>
> This still doesn't help in this case where CONFIG_PM_RUNTIME is
> disabled.  I think it will be simpler to set the noio flag during every
> device reset.

Yes, it's better to set the flag during every device reset now.

Also pppoe or network interface over serial port is a bit difficult to
deal with, as Oliver pointed out.


Thanks,
--
Ming Lei

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2012-10-31 23:18 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-10-29 12:23 [PATCH v3 0/6] solve deadlock caused by memory allocation with I/O Ming Lei
2012-10-29 12:23 ` [PATCH v3 1/6] mm: teach mm by current context info to not do I/O during memory allocation Ming Lei
2012-10-29 12:23 ` [PATCH v3 2/6] PM / Runtime: introduce pm_runtime_set[get]_memalloc_noio() Ming Lei
2012-10-29 15:41   ` Alan Stern
2012-10-30  3:21     ` Ming Lei
2012-10-30 10:57       ` Oliver Neukum
2012-10-30 11:30         ` Ming Lei
2012-10-30 15:38       ` Alan Stern
2012-10-30 16:00         ` Ming Lei
2012-10-30 16:15           ` Ming Lei
2012-10-30 16:30           ` Oliver Neukum
2012-10-31  2:08             ` Ming Lei
2012-10-31  3:05               ` Ming Lei
2012-10-31  8:37                 ` Oliver Neukum
2012-10-31 15:20                 ` Alan Stern
2012-10-31 15:41                   ` Alan Stern
2012-10-31 23:18                     ` Ming Lei
2012-10-30 16:54           ` Alan Stern
2012-10-29 12:23 ` [PATCH v3 3/6] block/genhd.c: apply pm_runtime_set_memalloc_noio on block devices Ming Lei
2012-10-29 12:23 ` [PATCH v3 4/6] net/core: apply pm_runtime_set_memalloc_noio on network devices Ming Lei
2012-10-29 15:44   ` Alan Stern
2012-10-29 12:23 ` [PATCH v3 5/6] PM / Runtime: force memory allocation with no I/O during runtime_resume callbcack Ming Lei
2012-10-29 12:24 ` [PATCH v3 6/6] USB: forbid memory allocation with I/O during bus reset Ming Lei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).