All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v1 0/3] mm/PM/USB: memory allocation with no io in need
@ 2012-10-16 15:59 Ming Lei
  2012-10-16 15:59   ` Ming Lei
                   ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: Ming Lei @ 2012-10-16 15:59 UTC (permalink / raw)
  To: linux-kernel
  Cc: Alan Stern, Oliver Neukum, Minchan Kim, Greg Kroah-Hartman,
	linux-usb, linux-pm

Hi,

This patch set introduces one process flag and trys to fix one deadlock
problem on block device during runtime resume or usb bus reset.

The 1st one is the change on include/sched.h and mm.

The other 2 patches are applied again PM and USB subsystem to demo
how to use the introduced mechanism to fix the deadlock problem.

V1:
        - take Minchan's change to avoid the check in alloc_page hot path
	- change the helpers' style into save/restore as suggested by Alan
	- memory allocation with no io in usb bus reset path for all devices
	as suggested by Greg and Oliver


 drivers/base/power/runtime.c |   14 ++++++++++++++
 drivers/usb/core/hub.c       |   11 +++++++++++
 include/linux/sched.h        |   11 +++++++++++
 mm/page_alloc.c              |   10 +++++++++-
 mm/vmscan.c                  |   13 +++++++++++++
 5 files changed, 58 insertions(+), 1 deletion(-)

Thanks,
--
Ming Lei


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [RFC PATCH v1 1/3] mm: teach mm by current context info to not do I/O during memory allocation
  2012-10-16 15:59 [RFC PATCH v1 0/3] mm/PM/USB: memory allocation with no io in need Ming Lei
@ 2012-10-16 15:59   ` Ming Lei
  2012-10-16 15:59 ` [RFC PATCH v1 2/3] PM / Runtime: force memory allocation with no I/O during runtime_resume callbcack Ming Lei
  2012-10-16 15:59 ` [RFC PATCH v1 3/3] USB: forbid memory allocation with I/O during bus reset Ming Lei
  2 siblings, 0 replies; 23+ messages in thread
From: Ming Lei @ 2012-10-16 15:59 UTC (permalink / raw)
  To: linux-kernel
  Cc: Alan Stern, Oliver Neukum, Minchan Kim, Greg Kroah-Hartman,
	linux-usb, linux-pm, Ming Lei, Jiri Kosina, Andrew Morton,
	Mel Gorman, KAMEZAWA Hiroyuki, Michal Hocko, Ingo Molnar,
	Peter Zijlstra, Rafael J. Wysocki, linux-mm

This patch introduces PF_MEMALLOC_NOIO on process flag('flags' field of
'struct task_struct'), so that the flag can be set by one task
to avoid doing I/O inside memory allocation in the task's context.

The patch trys to solve one deadlock problem caused by block device,
and the problem may happen at least in the below situations:

- during block device runtime resume, if memory allocation with
GFP_KERNEL is called inside runtime resume callback of any one
of its ancestors(or the block device itself), the deadlock may be
triggered inside the memory allocation since it might not complete
until the block device becomes active and the involed page I/O finishes.
The situation is pointed out first by Alan Stern. It is not a good
approach to convert all GFP_KERNEL in the path into GFP_NOIO because
several subsystems may be involved(for example, PCI, USB and SCSI may
be involved for usb mass stoarage device)

- during error handling of usb mass storage deivce, USB bus reset
will be put on the device, so there shouldn't have any
memory allocation with GFP_KERNEL during USB bus reset, otherwise
the deadlock similar with above may be triggered. Unfortunately, any
usb device may include one mass storage interface in theory, so it
requires all usb interface drivers to handle the situation. In fact,
most usb drivers don't know how to handle bus reset on the device
and don't provide .pre_set() and .post_reset() callback at all, so
USB core has to unbind and bind driver for these devices. So it
is still not practical to resort to GFP_NOIO for solving the problem.

Also the introduced solution can be used by block subsystem or block
drivers too, for example, set the PF_MEMALLOC_NOIO flag before doing
actual I/O transfer.

Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Oliver Neukum <oneukum@suse.de>
Cc: Jiri Kosina <jiri.kosina@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: linux-mm <linux-mm@kvack.org>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
---
 include/linux/sched.h |   11 +++++++++++
 mm/page_alloc.c       |   10 +++++++++-
 mm/vmscan.c           |   13 +++++++++++++
 3 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index f6961c9..c149ae7 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1811,6 +1811,7 @@ extern void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t *
 #define PF_FROZEN	0x00010000	/* frozen for system suspend */
 #define PF_FSTRANS	0x00020000	/* inside a filesystem transaction */
 #define PF_KSWAPD	0x00040000	/* I am kswapd */
+#define PF_MEMALLOC_NOIO 0x00080000	/* Allocating memory without IO involved */
 #define PF_LESS_THROTTLE 0x00100000	/* Throttle me less: I clean memory */
 #define PF_KTHREAD	0x00200000	/* I am a kernel thread */
 #define PF_RANDOMIZE	0x00400000	/* randomize virtual address space */
@@ -1848,6 +1849,16 @@ extern void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t *
 #define tsk_used_math(p) ((p)->flags & PF_USED_MATH)
 #define used_math() tsk_used_math(current)
 
+#define memalloc_noio() (current->flags & PF_MEMALLOC_NOIO)
+#define memalloc_noio_save(noio_flag) do { \
+	(noio_flag) = current->flags & PF_MEMALLOC_NOIO; \
+	current->flags |= PF_MEMALLOC_NOIO; \
+} while (0)
+#define memalloc_noio_restore(noio_flag) do { \
+	if (!(noio_flag)) \
+		current->flags &= ~PF_MEMALLOC_NOIO; \
+} while (0)
+
 /*
  * task->jobctl flags
  */
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 8e1be1c..e3746dd 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2630,10 +2630,18 @@ retry_cpuset:
 	page = get_page_from_freelist(gfp_mask|__GFP_HARDWALL, nodemask, order,
 			zonelist, high_zoneidx, alloc_flags,
 			preferred_zone, migratetype);
-	if (unlikely(!page))
+	if (unlikely(!page)) {
+		/*
+		 * Resume, block IO and its error handling path
+		 * can deadlock because I/O on the device might not
+		 * complete.
+		 */
+		if (unlikely(memalloc_noio()))
+			gfp_mask &= ~GFP_IOFS;
 		page = __alloc_pages_slowpath(gfp_mask, order,
 				zonelist, high_zoneidx, nodemask,
 				preferred_zone, migratetype);
+	}
 
 	trace_mm_page_alloc(page, order, gfp_mask, migratetype);
 
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 1e9aa66..6647805 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3298,6 +3298,19 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
 	};
 	unsigned long nr_slab_pages0, nr_slab_pages1;
 
+	if (unlikely(memalloc_noio())) {
+		sc.gfp_mask &= ~GFP_IOFS;
+		shrink.gfp_mask = sc.gfp_mask;
+		/*
+		 * We allow to reclaim only clean pages.
+		 * It can affect RECLAIM_SWAP and RECLAIM_WRITE mode
+		 * but this is really rare event and allocator can
+		 * fallback to other zones.
+		 */
+		sc.may_writepage = 0;
+		sc.may_swap = 0;
+	}
+
 	cond_resched();
 	/*
 	 * We need to be able to allocate from the reserves for RECLAIM_SWAP
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [RFC PATCH v1 1/3] mm: teach mm by current context info to not do I/O during memory allocation
@ 2012-10-16 15:59   ` Ming Lei
  0 siblings, 0 replies; 23+ messages in thread
From: Ming Lei @ 2012-10-16 15:59 UTC (permalink / raw)
  To: linux-kernel
  Cc: Alan Stern, Oliver Neukum, Minchan Kim, Greg Kroah-Hartman,
	linux-usb, linux-pm, Ming Lei, Jiri Kosina, Andrew Morton,
	Mel Gorman, KAMEZAWA Hiroyuki, Michal Hocko, Ingo Molnar,
	Peter Zijlstra, Rafael J. Wysocki, linux-mm

This patch introduces PF_MEMALLOC_NOIO on process flag('flags' field of
'struct task_struct'), so that the flag can be set by one task
to avoid doing I/O inside memory allocation in the task's context.

The patch trys to solve one deadlock problem caused by block device,
and the problem may happen at least in the below situations:

- during block device runtime resume, if memory allocation with
GFP_KERNEL is called inside runtime resume callback of any one
of its ancestors(or the block device itself), the deadlock may be
triggered inside the memory allocation since it might not complete
until the block device becomes active and the involed page I/O finishes.
The situation is pointed out first by Alan Stern. It is not a good
approach to convert all GFP_KERNEL in the path into GFP_NOIO because
several subsystems may be involved(for example, PCI, USB and SCSI may
be involved for usb mass stoarage device)

- during error handling of usb mass storage deivce, USB bus reset
will be put on the device, so there shouldn't have any
memory allocation with GFP_KERNEL during USB bus reset, otherwise
the deadlock similar with above may be triggered. Unfortunately, any
usb device may include one mass storage interface in theory, so it
requires all usb interface drivers to handle the situation. In fact,
most usb drivers don't know how to handle bus reset on the device
and don't provide .pre_set() and .post_reset() callback at all, so
USB core has to unbind and bind driver for these devices. So it
is still not practical to resort to GFP_NOIO for solving the problem.

Also the introduced solution can be used by block subsystem or block
drivers too, for example, set the PF_MEMALLOC_NOIO flag before doing
actual I/O transfer.

Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Oliver Neukum <oneukum@suse.de>
Cc: Jiri Kosina <jiri.kosina@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: linux-mm <linux-mm@kvack.org>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
---
 include/linux/sched.h |   11 +++++++++++
 mm/page_alloc.c       |   10 +++++++++-
 mm/vmscan.c           |   13 +++++++++++++
 3 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index f6961c9..c149ae7 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1811,6 +1811,7 @@ extern void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t *
 #define PF_FROZEN	0x00010000	/* frozen for system suspend */
 #define PF_FSTRANS	0x00020000	/* inside a filesystem transaction */
 #define PF_KSWAPD	0x00040000	/* I am kswapd */
+#define PF_MEMALLOC_NOIO 0x00080000	/* Allocating memory without IO involved */
 #define PF_LESS_THROTTLE 0x00100000	/* Throttle me less: I clean memory */
 #define PF_KTHREAD	0x00200000	/* I am a kernel thread */
 #define PF_RANDOMIZE	0x00400000	/* randomize virtual address space */
@@ -1848,6 +1849,16 @@ extern void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t *
 #define tsk_used_math(p) ((p)->flags & PF_USED_MATH)
 #define used_math() tsk_used_math(current)
 
+#define memalloc_noio() (current->flags & PF_MEMALLOC_NOIO)
+#define memalloc_noio_save(noio_flag) do { \
+	(noio_flag) = current->flags & PF_MEMALLOC_NOIO; \
+	current->flags |= PF_MEMALLOC_NOIO; \
+} while (0)
+#define memalloc_noio_restore(noio_flag) do { \
+	if (!(noio_flag)) \
+		current->flags &= ~PF_MEMALLOC_NOIO; \
+} while (0)
+
 /*
  * task->jobctl flags
  */
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 8e1be1c..e3746dd 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2630,10 +2630,18 @@ retry_cpuset:
 	page = get_page_from_freelist(gfp_mask|__GFP_HARDWALL, nodemask, order,
 			zonelist, high_zoneidx, alloc_flags,
 			preferred_zone, migratetype);
-	if (unlikely(!page))
+	if (unlikely(!page)) {
+		/*
+		 * Resume, block IO and its error handling path
+		 * can deadlock because I/O on the device might not
+		 * complete.
+		 */
+		if (unlikely(memalloc_noio()))
+			gfp_mask &= ~GFP_IOFS;
 		page = __alloc_pages_slowpath(gfp_mask, order,
 				zonelist, high_zoneidx, nodemask,
 				preferred_zone, migratetype);
+	}
 
 	trace_mm_page_alloc(page, order, gfp_mask, migratetype);
 
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 1e9aa66..6647805 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3298,6 +3298,19 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
 	};
 	unsigned long nr_slab_pages0, nr_slab_pages1;
 
+	if (unlikely(memalloc_noio())) {
+		sc.gfp_mask &= ~GFP_IOFS;
+		shrink.gfp_mask = sc.gfp_mask;
+		/*
+		 * We allow to reclaim only clean pages.
+		 * It can affect RECLAIM_SWAP and RECLAIM_WRITE mode
+		 * but this is really rare event and allocator can
+		 * fallback to other zones.
+		 */
+		sc.may_writepage = 0;
+		sc.may_swap = 0;
+	}
+
 	cond_resched();
 	/*
 	 * We need to be able to allocate from the reserves for RECLAIM_SWAP
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [RFC PATCH v1 2/3] PM / Runtime: force memory allocation with no I/O during runtime_resume callbcack
  2012-10-16 15:59 [RFC PATCH v1 0/3] mm/PM/USB: memory allocation with no io in need Ming Lei
  2012-10-16 15:59   ` Ming Lei
@ 2012-10-16 15:59 ` Ming Lei
  2012-10-17  5:43   ` Rafael J. Wysocki
  2012-10-16 15:59 ` [RFC PATCH v1 3/3] USB: forbid memory allocation with I/O during bus reset Ming Lei
  2 siblings, 1 reply; 23+ messages in thread
From: Ming Lei @ 2012-10-16 15:59 UTC (permalink / raw)
  To: linux-kernel
  Cc: Alan Stern, Oliver Neukum, Minchan Kim, Greg Kroah-Hartman,
	linux-usb, linux-pm, Ming Lei, Rafael J. Wysocki

This patch applies the introduced memalloc_noio_save() and
memalloc_noio_restore() to force memory allocation with no I/O
during runtime_resume callback.

Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Oliver Neukum <oneukum@suse.de>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
---
 drivers/base/power/runtime.c |   14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
index 3148b10..c71a8f0 100644
--- a/drivers/base/power/runtime.c
+++ b/drivers/base/power/runtime.c
@@ -503,6 +503,7 @@ static int rpm_resume(struct device *dev, int rpmflags)
 	int (*callback)(struct device *);
 	struct device *parent = NULL;
 	int retval = 0;
+	unsigned int noio_flag;
 
 	trace_rpm_resume(dev, rpmflags);
 
@@ -652,7 +653,20 @@ static int rpm_resume(struct device *dev, int rpmflags)
 	if (!callback && dev->driver && dev->driver->pm)
 		callback = dev->driver->pm->runtime_resume;
 
+	/*
+	 * Deadlock might be caused if memory allocation with GFP_KERNEL
+	 * happens inside runtime_resume callback of one block device's
+	 * ancestor or the block device itself. The easiest approach is
+	 * to forbid I/O inside runtime_resume of all devices.
+	 *
+	 * In fact, it can be done only if the deivce is a block device
+	 * or there is one block device descendant. But that may become
+	 * complicated and not efficient because device tree traversing
+	 * is involved.
+	 */
+	memalloc_noio_save(noio_flag);
 	retval = rpm_callback(callback, dev);
+	memalloc_noio_restore(noio_flag);
 	if (retval) {
 		__update_runtime_status(dev, RPM_SUSPENDED);
 		pm_runtime_cancel_pending(dev);
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [RFC PATCH v1 3/3] USB: forbid memory allocation with I/O during bus reset
  2012-10-16 15:59 [RFC PATCH v1 0/3] mm/PM/USB: memory allocation with no io in need Ming Lei
  2012-10-16 15:59   ` Ming Lei
  2012-10-16 15:59 ` [RFC PATCH v1 2/3] PM / Runtime: force memory allocation with no I/O during runtime_resume callbcack Ming Lei
@ 2012-10-16 15:59 ` Ming Lei
  2 siblings, 0 replies; 23+ messages in thread
From: Ming Lei @ 2012-10-16 15:59 UTC (permalink / raw)
  To: linux-kernel
  Cc: Alan Stern, Oliver Neukum, Minchan Kim, Greg Kroah-Hartman,
	linux-usb, linux-pm, Ming Lei

If one storage interface or usb network interface(iSCSI case)
exists in current configuration, memory allocation with
GFP_KERNEL during usb_device_reset() might trigger I/O transfer
on the storage interface itself and cause deadlock because
the 'us->dev_mutex' is held in .pre_reset() and the storage
interface can't do I/O transfer when the reset is triggered
by other interface, or the error handling can't be completed
if the reset is triggered by the storage itself(error handling path).

Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Oliver Neukum <oneukum@suse.de>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
---
 drivers/usb/core/hub.c |   11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
index 9dc8ff2..3ea81f6 100644
--- a/drivers/usb/core/hub.c
+++ b/drivers/usb/core/hub.c
@@ -5028,6 +5028,7 @@ int usb_reset_device(struct usb_device *udev)
 {
 	int ret;
 	int i;
+	unsigned int noio_flag;
 	struct usb_host_config *config = udev->actconfig;
 
 	if (udev->state == USB_STATE_NOTATTACHED ||
@@ -5037,6 +5038,15 @@ int usb_reset_device(struct usb_device *udev)
 		return -EINVAL;
 	}
 
+	/*
+	 * Don't allocate memory with GFP_KERNEL in current
+	 * context to avoid possible deadlock if usb mass
+	 * storage interface or usbnet interface(iSCSI case)
+	 * is included in current configuration. The easiest
+	 * approach is to do it for all devices.
+	 */
+	memalloc_noio_save(noio_flag);
+
 	/* Prevent autosuspend during the reset */
 	usb_autoresume_device(udev);
 
@@ -5081,6 +5091,7 @@ int usb_reset_device(struct usb_device *udev)
 	}
 
 	usb_autosuspend_device(udev);
+	memalloc_noio_restore(noio_flag);
 	return ret;
 }
 EXPORT_SYMBOL_GPL(usb_reset_device);
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH v1 1/3] mm: teach mm by current context info to not do I/O during memory allocation
  2012-10-16 15:59   ` Ming Lei
@ 2012-10-16 20:19     ` Andrew Morton
  -1 siblings, 0 replies; 23+ messages in thread
From: Andrew Morton @ 2012-10-16 20:19 UTC (permalink / raw)
  To: Ming Lei
  Cc: linux-kernel, Alan Stern, Oliver Neukum, Minchan Kim,
	Greg Kroah-Hartman, linux-usb, linux-pm, Jiri Kosina, Mel Gorman,
	KAMEZAWA Hiroyuki, Michal Hocko, Ingo Molnar, Peter Zijlstra,
	Rafael J. Wysocki, linux-mm

On Tue, 16 Oct 2012 23:59:41 +0800
Ming Lei <ming.lei@canonical.com> wrote:

> This patch introduces PF_MEMALLOC_NOIO on process flag('flags' field of
> 'struct task_struct'), so that the flag can be set by one task
> to avoid doing I/O inside memory allocation in the task's context.
> 
> The patch trys to solve one deadlock problem caused by block device,
> and the problem may happen at least in the below situations:
> 
> - during block device runtime resume, if memory allocation with
> GFP_KERNEL is called inside runtime resume callback of any one
> of its ancestors(or the block device itself), the deadlock may be
> triggered inside the memory allocation since it might not complete
> until the block device becomes active and the involed page I/O finishes.
> The situation is pointed out first by Alan Stern. It is not a good
> approach to convert all GFP_KERNEL in the path into GFP_NOIO because
> several subsystems may be involved(for example, PCI, USB and SCSI may
> be involved for usb mass stoarage device)
> 
> - during error handling of usb mass storage deivce, USB bus reset
> will be put on the device, so there shouldn't have any
> memory allocation with GFP_KERNEL during USB bus reset, otherwise
> the deadlock similar with above may be triggered. Unfortunately, any
> usb device may include one mass storage interface in theory, so it
> requires all usb interface drivers to handle the situation. In fact,
> most usb drivers don't know how to handle bus reset on the device
> and don't provide .pre_set() and .post_reset() callback at all, so
> USB core has to unbind and bind driver for these devices. So it
> is still not practical to resort to GFP_NOIO for solving the problem.
> 
> Also the introduced solution can be used by block subsystem or block
> drivers too, for example, set the PF_MEMALLOC_NOIO flag before doing
> actual I/O transfer.

The patch seems reasonable to me.  I'd like to see some examples of
these resume-time callsite which are performing the GFP_KERNEL
allocations, please.  You have found some kernel bugs, so those should
be fully described.

> @@ -1848,6 +1849,16 @@ extern void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t *
>  #define tsk_used_math(p) ((p)->flags & PF_USED_MATH)
>  #define used_math() tsk_used_math(current)
>  
> +#define memalloc_noio() (current->flags & PF_MEMALLOC_NOIO)
> +#define memalloc_noio_save(noio_flag) do { \
> +	(noio_flag) = current->flags & PF_MEMALLOC_NOIO; \
> +	current->flags |= PF_MEMALLOC_NOIO; \
> +} while (0)
> +#define memalloc_noio_restore(noio_flag) do { \
> +	if (!(noio_flag)) \
> +		current->flags &= ~PF_MEMALLOC_NOIO; \
> +} while (0)

This is just awful.  Why oh why do we write code in macros when we have
a nice C compiler?

These can all be done as nice, clean, type-safe, documented C
functions.  And if they can be done that way, they *should* be done
that way!

And I suggest that a better name for memalloc_noio_save() is
memalloc_noio_set().  So this:

static inline unsigned memalloc_noio(void)
{
	return current->flags & PF_MEMALLOC_NOIO;
}

static inline unsigned memalloc_noio_set(unsigned flags)
{
	unsigned ret = memalloc_noio();

	current->flags |= PF_MEMALLOC_NOIO;
	return ret;
}

static inline unsigned memalloc_noio_restore(unsigned flags)
{
	current->flags = (current->flags & ~PF_MEMALLOC_NOIO) | flags;
}

(I think that's correct?  It's probably more efficient this way).

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH v1 1/3] mm: teach mm by current context info to not do I/O during memory allocation
@ 2012-10-16 20:19     ` Andrew Morton
  0 siblings, 0 replies; 23+ messages in thread
From: Andrew Morton @ 2012-10-16 20:19 UTC (permalink / raw)
  To: Ming Lei
  Cc: linux-kernel, Alan Stern, Oliver Neukum, Minchan Kim,
	Greg Kroah-Hartman, linux-usb, linux-pm, Jiri Kosina, Mel Gorman,
	KAMEZAWA Hiroyuki, Michal Hocko, Ingo Molnar, Peter Zijlstra,
	Rafael J. Wysocki, linux-mm

On Tue, 16 Oct 2012 23:59:41 +0800
Ming Lei <ming.lei@canonical.com> wrote:

> This patch introduces PF_MEMALLOC_NOIO on process flag('flags' field of
> 'struct task_struct'), so that the flag can be set by one task
> to avoid doing I/O inside memory allocation in the task's context.
> 
> The patch trys to solve one deadlock problem caused by block device,
> and the problem may happen at least in the below situations:
> 
> - during block device runtime resume, if memory allocation with
> GFP_KERNEL is called inside runtime resume callback of any one
> of its ancestors(or the block device itself), the deadlock may be
> triggered inside the memory allocation since it might not complete
> until the block device becomes active and the involed page I/O finishes.
> The situation is pointed out first by Alan Stern. It is not a good
> approach to convert all GFP_KERNEL in the path into GFP_NOIO because
> several subsystems may be involved(for example, PCI, USB and SCSI may
> be involved for usb mass stoarage device)
> 
> - during error handling of usb mass storage deivce, USB bus reset
> will be put on the device, so there shouldn't have any
> memory allocation with GFP_KERNEL during USB bus reset, otherwise
> the deadlock similar with above may be triggered. Unfortunately, any
> usb device may include one mass storage interface in theory, so it
> requires all usb interface drivers to handle the situation. In fact,
> most usb drivers don't know how to handle bus reset on the device
> and don't provide .pre_set() and .post_reset() callback at all, so
> USB core has to unbind and bind driver for these devices. So it
> is still not practical to resort to GFP_NOIO for solving the problem.
> 
> Also the introduced solution can be used by block subsystem or block
> drivers too, for example, set the PF_MEMALLOC_NOIO flag before doing
> actual I/O transfer.

The patch seems reasonable to me.  I'd like to see some examples of
these resume-time callsite which are performing the GFP_KERNEL
allocations, please.  You have found some kernel bugs, so those should
be fully described.

> @@ -1848,6 +1849,16 @@ extern void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t *
>  #define tsk_used_math(p) ((p)->flags & PF_USED_MATH)
>  #define used_math() tsk_used_math(current)
>  
> +#define memalloc_noio() (current->flags & PF_MEMALLOC_NOIO)
> +#define memalloc_noio_save(noio_flag) do { \
> +	(noio_flag) = current->flags & PF_MEMALLOC_NOIO; \
> +	current->flags |= PF_MEMALLOC_NOIO; \
> +} while (0)
> +#define memalloc_noio_restore(noio_flag) do { \
> +	if (!(noio_flag)) \
> +		current->flags &= ~PF_MEMALLOC_NOIO; \
> +} while (0)

This is just awful.  Why oh why do we write code in macros when we have
a nice C compiler?

These can all be done as nice, clean, type-safe, documented C
functions.  And if they can be done that way, they *should* be done
that way!

And I suggest that a better name for memalloc_noio_save() is
memalloc_noio_set().  So this:

static inline unsigned memalloc_noio(void)
{
	return current->flags & PF_MEMALLOC_NOIO;
}

static inline unsigned memalloc_noio_set(unsigned flags)
{
	unsigned ret = memalloc_noio();

	current->flags |= PF_MEMALLOC_NOIO;
	return ret;
}

static inline unsigned memalloc_noio_restore(unsigned flags)
{
	current->flags = (current->flags & ~PF_MEMALLOC_NOIO) | flags;
}

(I think that's correct?  It's probably more efficient this way).

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH v1 1/3] mm: teach mm by current context info to not do I/O during memory allocation
  2012-10-16 20:19     ` Andrew Morton
@ 2012-10-17  1:54       ` Ming Lei
  -1 siblings, 0 replies; 23+ messages in thread
From: Ming Lei @ 2012-10-17  1:54 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, Alan Stern, Oliver Neukum, Minchan Kim,
	Greg Kroah-Hartman, linux-usb, linux-pm, Jiri Kosina, Mel Gorman,
	KAMEZAWA Hiroyuki, Michal Hocko, Ingo Molnar, Peter Zijlstra,
	Rafael J. Wysocki, linux-mm

On Wed, Oct 17, 2012 at 4:19 AM, Andrew Morton
<akpm@linux-foundation.org> wrote:
>
> The patch seems reasonable to me.  I'd like to see some examples of
> these resume-time callsite which are performing the GFP_KERNEL
> allocations, please.  You have found some kernel bugs, so those should
> be fully described.

There are two examples on 2/3 and 3/3 of the patchset, see below link:

        http://marc.info/?l=linux-kernel&m=135040325717213&w=2
        http://marc.info/?l=linux-kernel&m=135040327317222&w=2

Sorry for not Cc them to linux-mm because I am afraid of making noise
in mm list.

>
> This is just awful.  Why oh why do we write code in macros when we have
> a nice C compiler?

The two helpers are following style of local_irq_save() and
local_irq_restore(), so that people can use them easily, that is
why I define them as macro instead of inline.

>
> These can all be done as nice, clean, type-safe, documented C
> functions.  And if they can be done that way, they *should* be done
> that way!
>
> And I suggest that a better name for memalloc_noio_save() is
> memalloc_noio_set().  So this:

IMO, renaming as memalloc_noio_set() might not be better than _save
because the _set name doesn't indicate that the flag should be stored first.

>
> static inline unsigned memalloc_noio(void)
> {
>         return current->flags & PF_MEMALLOC_NOIO;
> }
>
> static inline unsigned memalloc_noio_set(unsigned flags)
> {
>         unsigned ret = memalloc_noio();
>
>         current->flags |= PF_MEMALLOC_NOIO;
>         return ret;
> }
>
> static inline unsigned memalloc_noio_restore(unsigned flags)
> {
>         current->flags = (current->flags & ~PF_MEMALLOC_NOIO) | flags;
> }
>
> (I think that's correct?  It's probably more efficient this way).

Yes, it is correct and more clean, and I will take it.

Thanks,
--
Ming Lei

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH v1 1/3] mm: teach mm by current context info to not do I/O during memory allocation
@ 2012-10-17  1:54       ` Ming Lei
  0 siblings, 0 replies; 23+ messages in thread
From: Ming Lei @ 2012-10-17  1:54 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, Alan Stern, Oliver Neukum, Minchan Kim,
	Greg Kroah-Hartman, linux-usb, linux-pm, Jiri Kosina, Mel Gorman,
	KAMEZAWA Hiroyuki, Michal Hocko, Ingo Molnar, Peter Zijlstra,
	Rafael J. Wysocki, linux-mm

On Wed, Oct 17, 2012 at 4:19 AM, Andrew Morton
<akpm@linux-foundation.org> wrote:
>
> The patch seems reasonable to me.  I'd like to see some examples of
> these resume-time callsite which are performing the GFP_KERNEL
> allocations, please.  You have found some kernel bugs, so those should
> be fully described.

There are two examples on 2/3 and 3/3 of the patchset, see below link:

        http://marc.info/?l=linux-kernel&m=135040325717213&w=2
        http://marc.info/?l=linux-kernel&m=135040327317222&w=2

Sorry for not Cc them to linux-mm because I am afraid of making noise
in mm list.

>
> This is just awful.  Why oh why do we write code in macros when we have
> a nice C compiler?

The two helpers are following style of local_irq_save() and
local_irq_restore(), so that people can use them easily, that is
why I define them as macro instead of inline.

>
> These can all be done as nice, clean, type-safe, documented C
> functions.  And if they can be done that way, they *should* be done
> that way!
>
> And I suggest that a better name for memalloc_noio_save() is
> memalloc_noio_set().  So this:

IMO, renaming as memalloc_noio_set() might not be better than _save
because the _set name doesn't indicate that the flag should be stored first.

>
> static inline unsigned memalloc_noio(void)
> {
>         return current->flags & PF_MEMALLOC_NOIO;
> }
>
> static inline unsigned memalloc_noio_set(unsigned flags)
> {
>         unsigned ret = memalloc_noio();
>
>         current->flags |= PF_MEMALLOC_NOIO;
>         return ret;
> }
>
> static inline unsigned memalloc_noio_restore(unsigned flags)
> {
>         current->flags = (current->flags & ~PF_MEMALLOC_NOIO) | flags;
> }
>
> (I think that's correct?  It's probably more efficient this way).

Yes, it is correct and more clean, and I will take it.

Thanks,
--
Ming Lei

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH v1 1/3] mm: teach mm by current context info to not do I/O during memory allocation
  2012-10-16 20:19     ` Andrew Morton
@ 2012-10-17  3:40       ` Ming Lei
  -1 siblings, 0 replies; 23+ messages in thread
From: Ming Lei @ 2012-10-17  3:40 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, Alan Stern, Oliver Neukum, Minchan Kim,
	Greg Kroah-Hartman, linux-usb, linux-pm, Jiri Kosina, Mel Gorman,
	KAMEZAWA Hiroyuki, Michal Hocko, Ingo Molnar, Peter Zijlstra,
	Rafael J. Wysocki, linux-mm

On Wed, Oct 17, 2012 at 4:19 AM, Andrew Morton
<akpm@linux-foundation.org> wrote:
>
> The patch seems reasonable to me.  I'd like to see some examples of
> these resume-time callsite which are performing the GFP_KERNEL
> allocations, please.  You have found some kernel bugs, so those should
> be fully described.

OK, there are two examples of GFP_KERNEL allocation in subsystem
runtime resume path:

1), almost all devices in some pci platform
acpi_os_allocate
	<-acpi_ut_allocate
		<-ACPI_ALLOCATE_ZEROED
			<-acpi_evaluate_object
				<-__acpi_bus_set_power
					<-acpi_bus_set_power
						<-acpi_pci_set_power_state
							<-platform_pci_set_power_state
								<-pci_platform_power_transition
									<-__pci_complete_power_transition
										<-pci_set_power_state
											<-pci_restore_standard_config
												<-pci_pm_runtime_resume

2), all devices in usb subsystem
usb_get_status
	<-finish_port_resume
		<-usb_port_resume
			<-generic_resume
				<-usb_resume_device
					<-usb_resume_both
						<-usb_runtime_resume	

I also have many examples in which GFP_KERNEL allocation
is involved in runtime resume path of individual drivers.

The above two examples just show how difficult to solve the problem
by traditional way, :-)

Also as pointed by Oliver, network driver need memory allocation with
no io in iSCSI runtime resume situation too.

Thanks,
--
Ming Lei

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH v1 1/3] mm: teach mm by current context info to not do I/O during memory allocation
@ 2012-10-17  3:40       ` Ming Lei
  0 siblings, 0 replies; 23+ messages in thread
From: Ming Lei @ 2012-10-17  3:40 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, Alan Stern, Oliver Neukum, Minchan Kim,
	Greg Kroah-Hartman, linux-usb, linux-pm, Jiri Kosina, Mel Gorman,
	KAMEZAWA Hiroyuki, Michal Hocko, Ingo Molnar, Peter Zijlstra,
	Rafael J. Wysocki, linux-mm

On Wed, Oct 17, 2012 at 4:19 AM, Andrew Morton
<akpm@linux-foundation.org> wrote:
>
> The patch seems reasonable to me.  I'd like to see some examples of
> these resume-time callsite which are performing the GFP_KERNEL
> allocations, please.  You have found some kernel bugs, so those should
> be fully described.

OK, there are two examples of GFP_KERNEL allocation in subsystem
runtime resume path:

1), almost all devices in some pci platform
acpi_os_allocate
	<-acpi_ut_allocate
		<-ACPI_ALLOCATE_ZEROED
			<-acpi_evaluate_object
				<-__acpi_bus_set_power
					<-acpi_bus_set_power
						<-acpi_pci_set_power_state
							<-platform_pci_set_power_state
								<-pci_platform_power_transition
									<-__pci_complete_power_transition
										<-pci_set_power_state
											<-pci_restore_standard_config
												<-pci_pm_runtime_resume

2), all devices in usb subsystem
usb_get_status
	<-finish_port_resume
		<-usb_port_resume
			<-generic_resume
				<-usb_resume_device
					<-usb_resume_both
						<-usb_runtime_resume	

I also have many examples in which GFP_KERNEL allocation
is involved in runtime resume path of individual drivers.

The above two examples just show how difficult to solve the problem
by traditional way, :-)

Also as pointed by Oliver, network driver need memory allocation with
no io in iSCSI runtime resume situation too.

Thanks,
--
Ming Lei

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH v1 1/3] mm: teach mm by current context info to not do I/O during memory allocation
  2012-10-16 15:59   ` Ming Lei
@ 2012-10-17  5:14     ` Kamezawa Hiroyuki
  -1 siblings, 0 replies; 23+ messages in thread
From: Kamezawa Hiroyuki @ 2012-10-17  5:14 UTC (permalink / raw)
  To: Ming Lei
  Cc: linux-kernel, Alan Stern, Oliver Neukum, Minchan Kim,
	Greg Kroah-Hartman, linux-usb, linux-pm, Jiri Kosina,
	Andrew Morton, Mel Gorman, Michal Hocko, Ingo Molnar,
	Peter Zijlstra, Rafael J. Wysocki, linux-mm

(2012/10/17 0:59), Ming Lei wrote:
> This patch introduces PF_MEMALLOC_NOIO on process flag('flags' field of
> 'struct task_struct'), so that the flag can be set by one task
> to avoid doing I/O inside memory allocation in the task's context.
> 
> The patch trys to solve one deadlock problem caused by block device,
> and the problem may happen at least in the below situations:
> 
> - during block device runtime resume, if memory allocation with
> GFP_KERNEL is called inside runtime resume callback of any one
> of its ancestors(or the block device itself), the deadlock may be
> triggered inside the memory allocation since it might not complete
> until the block device becomes active and the involed page I/O finishes.
> The situation is pointed out first by Alan Stern. It is not a good
> approach to convert all GFP_KERNEL in the path into GFP_NOIO because
> several subsystems may be involved(for example, PCI, USB and SCSI may
> be involved for usb mass stoarage device)
> 
> - during error handling of usb mass storage deivce, USB bus reset
> will be put on the device, so there shouldn't have any
> memory allocation with GFP_KERNEL during USB bus reset, otherwise
> the deadlock similar with above may be triggered. Unfortunately, any
> usb device may include one mass storage interface in theory, so it
> requires all usb interface drivers to handle the situation. In fact,
> most usb drivers don't know how to handle bus reset on the device
> and don't provide .pre_set() and .post_reset() callback at all, so
> USB core has to unbind and bind driver for these devices. So it
> is still not practical to resort to GFP_NOIO for solving the problem.
> 
> Also the introduced solution can be used by block subsystem or block
> drivers too, for example, set the PF_MEMALLOC_NOIO flag before doing
> actual I/O transfer.
> 
> Cc: Alan Stern <stern@rowland.harvard.edu>
> Cc: Oliver Neukum <oneukum@suse.de>
> Cc: Jiri Kosina <jiri.kosina@suse.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Mel Gorman <mel@csn.ul.ie>
> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Cc: Michal Hocko <mhocko@suse.cz>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
> Cc: linux-mm <linux-mm@kvack.org>
> Signed-off-by: Minchan Kim <minchan@kernel.org>
> Signed-off-by: Ming Lei <ming.lei@canonical.com>
> ---
>   include/linux/sched.h |   11 +++++++++++
>   mm/page_alloc.c       |   10 +++++++++-
>   mm/vmscan.c           |   13 +++++++++++++
>   3 files changed, 33 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index f6961c9..c149ae7 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1811,6 +1811,7 @@ extern void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t *
>   #define PF_FROZEN	0x00010000	/* frozen for system suspend */
>   #define PF_FSTRANS	0x00020000	/* inside a filesystem transaction */
>   #define PF_KSWAPD	0x00040000	/* I am kswapd */
> +#define PF_MEMALLOC_NOIO 0x00080000	/* Allocating memory without IO involved */
>   #define PF_LESS_THROTTLE 0x00100000	/* Throttle me less: I clean memory */
>   #define PF_KTHREAD	0x00200000	/* I am a kernel thread */
>   #define PF_RANDOMIZE	0x00400000	/* randomize virtual address space */
> @@ -1848,6 +1849,16 @@ extern void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t *
>   #define tsk_used_math(p) ((p)->flags & PF_USED_MATH)
>   #define used_math() tsk_used_math(current)
>   
> +#define memalloc_noio() (current->flags & PF_MEMALLOC_NOIO)
> +#define memalloc_noio_save(noio_flag) do { \
> +	(noio_flag) = current->flags & PF_MEMALLOC_NOIO; \
> +	current->flags |= PF_MEMALLOC_NOIO; \
> +} while (0)
> +#define memalloc_noio_restore(noio_flag) do { \
> +	if (!(noio_flag)) \
> +		current->flags &= ~PF_MEMALLOC_NOIO; \
> +} while (0)
> +
>   /*
>    * task->jobctl flags
>    */
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 8e1be1c..e3746dd 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2630,10 +2630,18 @@ retry_cpuset:
>   	page = get_page_from_freelist(gfp_mask|__GFP_HARDWALL, nodemask, order,
>   			zonelist, high_zoneidx, alloc_flags,
>   			preferred_zone, migratetype);
> -	if (unlikely(!page))
> +	if (unlikely(!page)) {
> +		/*
> +		 * Resume, block IO and its error handling path
> +		 * can deadlock because I/O on the device might not
> +		 * complete.
> +		 */
> +		if (unlikely(memalloc_noio()))
> +			gfp_mask &= ~GFP_IOFS;
>   		page = __alloc_pages_slowpath(gfp_mask, order,
>   				zonelist, high_zoneidx, nodemask,
>   				preferred_zone, migratetype);
> +	}
>   
>   	trace_mm_page_alloc(page, order, gfp_mask, migratetype);
>   
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 1e9aa66..6647805 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -3298,6 +3298,19 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
>   	};
>   	unsigned long nr_slab_pages0, nr_slab_pages1;
>   
> +	if (unlikely(memalloc_noio())) {
> +		sc.gfp_mask &= ~GFP_IOFS;
> +		shrink.gfp_mask = sc.gfp_mask;
> +		/*
> +		 * We allow to reclaim only clean pages.
> +		 * It can affect RECLAIM_SWAP and RECLAIM_WRITE mode
> +		 * but this is really rare event and allocator can
> +		 * fallback to other zones.
> +		 */
> +		sc.may_writepage = 0;
> +		sc.may_swap = 0;

I think the idea is reasonable. I have a request.

In current implemententation of vmscan.c, it seems sc.may_writepage, sc.may_swap
are handled independent from gfp_mask. 

So, could you drop changes from this patch and handle these flags in another patch
if these flags should be unset if ~GFP_IOFS ?

I think try_to_free_page() path's sc.may_xxxx should be handled in the same way.

Thanks,
-Kame






^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH v1 1/3] mm: teach mm by current context info to not do I/O during memory allocation
@ 2012-10-17  5:14     ` Kamezawa Hiroyuki
  0 siblings, 0 replies; 23+ messages in thread
From: Kamezawa Hiroyuki @ 2012-10-17  5:14 UTC (permalink / raw)
  To: Ming Lei
  Cc: linux-kernel, Alan Stern, Oliver Neukum, Minchan Kim,
	Greg Kroah-Hartman, linux-usb, linux-pm, Jiri Kosina,
	Andrew Morton, Mel Gorman, Michal Hocko, Ingo Molnar,
	Peter Zijlstra, Rafael J. Wysocki, linux-mm

(2012/10/17 0:59), Ming Lei wrote:
> This patch introduces PF_MEMALLOC_NOIO on process flag('flags' field of
> 'struct task_struct'), so that the flag can be set by one task
> to avoid doing I/O inside memory allocation in the task's context.
> 
> The patch trys to solve one deadlock problem caused by block device,
> and the problem may happen at least in the below situations:
> 
> - during block device runtime resume, if memory allocation with
> GFP_KERNEL is called inside runtime resume callback of any one
> of its ancestors(or the block device itself), the deadlock may be
> triggered inside the memory allocation since it might not complete
> until the block device becomes active and the involed page I/O finishes.
> The situation is pointed out first by Alan Stern. It is not a good
> approach to convert all GFP_KERNEL in the path into GFP_NOIO because
> several subsystems may be involved(for example, PCI, USB and SCSI may
> be involved for usb mass stoarage device)
> 
> - during error handling of usb mass storage deivce, USB bus reset
> will be put on the device, so there shouldn't have any
> memory allocation with GFP_KERNEL during USB bus reset, otherwise
> the deadlock similar with above may be triggered. Unfortunately, any
> usb device may include one mass storage interface in theory, so it
> requires all usb interface drivers to handle the situation. In fact,
> most usb drivers don't know how to handle bus reset on the device
> and don't provide .pre_set() and .post_reset() callback at all, so
> USB core has to unbind and bind driver for these devices. So it
> is still not practical to resort to GFP_NOIO for solving the problem.
> 
> Also the introduced solution can be used by block subsystem or block
> drivers too, for example, set the PF_MEMALLOC_NOIO flag before doing
> actual I/O transfer.
> 
> Cc: Alan Stern <stern@rowland.harvard.edu>
> Cc: Oliver Neukum <oneukum@suse.de>
> Cc: Jiri Kosina <jiri.kosina@suse.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Mel Gorman <mel@csn.ul.ie>
> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Cc: Michal Hocko <mhocko@suse.cz>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
> Cc: linux-mm <linux-mm@kvack.org>
> Signed-off-by: Minchan Kim <minchan@kernel.org>
> Signed-off-by: Ming Lei <ming.lei@canonical.com>
> ---
>   include/linux/sched.h |   11 +++++++++++
>   mm/page_alloc.c       |   10 +++++++++-
>   mm/vmscan.c           |   13 +++++++++++++
>   3 files changed, 33 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index f6961c9..c149ae7 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1811,6 +1811,7 @@ extern void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t *
>   #define PF_FROZEN	0x00010000	/* frozen for system suspend */
>   #define PF_FSTRANS	0x00020000	/* inside a filesystem transaction */
>   #define PF_KSWAPD	0x00040000	/* I am kswapd */
> +#define PF_MEMALLOC_NOIO 0x00080000	/* Allocating memory without IO involved */
>   #define PF_LESS_THROTTLE 0x00100000	/* Throttle me less: I clean memory */
>   #define PF_KTHREAD	0x00200000	/* I am a kernel thread */
>   #define PF_RANDOMIZE	0x00400000	/* randomize virtual address space */
> @@ -1848,6 +1849,16 @@ extern void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t *
>   #define tsk_used_math(p) ((p)->flags & PF_USED_MATH)
>   #define used_math() tsk_used_math(current)
>   
> +#define memalloc_noio() (current->flags & PF_MEMALLOC_NOIO)
> +#define memalloc_noio_save(noio_flag) do { \
> +	(noio_flag) = current->flags & PF_MEMALLOC_NOIO; \
> +	current->flags |= PF_MEMALLOC_NOIO; \
> +} while (0)
> +#define memalloc_noio_restore(noio_flag) do { \
> +	if (!(noio_flag)) \
> +		current->flags &= ~PF_MEMALLOC_NOIO; \
> +} while (0)
> +
>   /*
>    * task->jobctl flags
>    */
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 8e1be1c..e3746dd 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2630,10 +2630,18 @@ retry_cpuset:
>   	page = get_page_from_freelist(gfp_mask|__GFP_HARDWALL, nodemask, order,
>   			zonelist, high_zoneidx, alloc_flags,
>   			preferred_zone, migratetype);
> -	if (unlikely(!page))
> +	if (unlikely(!page)) {
> +		/*
> +		 * Resume, block IO and its error handling path
> +		 * can deadlock because I/O on the device might not
> +		 * complete.
> +		 */
> +		if (unlikely(memalloc_noio()))
> +			gfp_mask &= ~GFP_IOFS;
>   		page = __alloc_pages_slowpath(gfp_mask, order,
>   				zonelist, high_zoneidx, nodemask,
>   				preferred_zone, migratetype);
> +	}
>   
>   	trace_mm_page_alloc(page, order, gfp_mask, migratetype);
>   
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 1e9aa66..6647805 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -3298,6 +3298,19 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
>   	};
>   	unsigned long nr_slab_pages0, nr_slab_pages1;
>   
> +	if (unlikely(memalloc_noio())) {
> +		sc.gfp_mask &= ~GFP_IOFS;
> +		shrink.gfp_mask = sc.gfp_mask;
> +		/*
> +		 * We allow to reclaim only clean pages.
> +		 * It can affect RECLAIM_SWAP and RECLAIM_WRITE mode
> +		 * but this is really rare event and allocator can
> +		 * fallback to other zones.
> +		 */
> +		sc.may_writepage = 0;
> +		sc.may_swap = 0;

I think the idea is reasonable. I have a request.

In current implemententation of vmscan.c, it seems sc.may_writepage, sc.may_swap
are handled independent from gfp_mask. 

So, could you drop changes from this patch and handle these flags in another patch
if these flags should be unset if ~GFP_IOFS ?

I think try_to_free_page() path's sc.may_xxxx should be handled in the same way.

Thanks,
-Kame





--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH v1 2/3] PM / Runtime: force memory allocation with no I/O during runtime_resume callbcack
  2012-10-16 15:59 ` [RFC PATCH v1 2/3] PM / Runtime: force memory allocation with no I/O during runtime_resume callbcack Ming Lei
@ 2012-10-17  5:43   ` Rafael J. Wysocki
  2012-10-17 11:07     ` Ming Lei
  0 siblings, 1 reply; 23+ messages in thread
From: Rafael J. Wysocki @ 2012-10-17  5:43 UTC (permalink / raw)
  To: Ming Lei
  Cc: linux-kernel, Alan Stern, Oliver Neukum, Minchan Kim,
	Greg Kroah-Hartman, linux-usb, linux-pm

On Tuesday 16 of October 2012 23:59:42 Ming Lei wrote:
> This patch applies the introduced memalloc_noio_save() and
> memalloc_noio_restore() to force memory allocation with no I/O
> during runtime_resume callback.
> 
> Cc: Alan Stern <stern@rowland.harvard.edu>
> Cc: Oliver Neukum <oneukum@suse.de>
> Cc: Rafael J. Wysocki <rjw@sisk.pl>
> Signed-off-by: Ming Lei <ming.lei@canonical.com>
> ---
>  drivers/base/power/runtime.c |   14 ++++++++++++++
>  1 file changed, 14 insertions(+)
> 
> diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
> index 3148b10..c71a8f0 100644
> --- a/drivers/base/power/runtime.c
> +++ b/drivers/base/power/runtime.c
> @@ -503,6 +503,7 @@ static int rpm_resume(struct device *dev, int rpmflags)
>  	int (*callback)(struct device *);
>  	struct device *parent = NULL;
>  	int retval = 0;
> +	unsigned int noio_flag;
>  
>  	trace_rpm_resume(dev, rpmflags);
>  
> @@ -652,7 +653,20 @@ static int rpm_resume(struct device *dev, int rpmflags)
>  	if (!callback && dev->driver && dev->driver->pm)
>  		callback = dev->driver->pm->runtime_resume;
>  
> +	/*
> +	 * Deadlock might be caused if memory allocation with GFP_KERNEL
> +	 * happens inside runtime_resume callback of one block device's
> +	 * ancestor or the block device itself. The easiest approach is
> +	 * to forbid I/O inside runtime_resume of all devices.
> +	 *
> +	 * In fact, it can be done only if the deivce is a block device
> +	 * or there is one block device descendant. But that may become
> +	 * complicated and not efficient because device tree traversing
> +	 * is involved.
> +	 */
> +	memalloc_noio_save(noio_flag);
>  	retval = rpm_callback(callback, dev);
> +	memalloc_noio_restore(noio_flag);
>  	if (retval) {
>  		__update_runtime_status(dev, RPM_SUSPENDED);
>  		pm_runtime_cancel_pending(dev);

This appears to be a bit too heavy handed.  First of all, it seems to affect
all memory allocations going in parallel with the resume callback.  Second,
it affects all resume callbacks, not only those where the problem really
appears.  As a result, we are likely to get some memory allocation failures
that don't happen without the patch and don't really need to happen at all.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH v1 1/3] mm: teach mm by current context info to not do I/O during memory allocation
  2012-10-17  5:14     ` Kamezawa Hiroyuki
@ 2012-10-17 10:56       ` Ming Lei
  -1 siblings, 0 replies; 23+ messages in thread
From: Ming Lei @ 2012-10-17 10:56 UTC (permalink / raw)
  To: Kamezawa Hiroyuki
  Cc: linux-kernel, Alan Stern, Oliver Neukum, Minchan Kim,
	Greg Kroah-Hartman, linux-usb, linux-pm, Jiri Kosina,
	Andrew Morton, Mel Gorman, Michal Hocko, Ingo Molnar,
	Peter Zijlstra, Rafael J. Wysocki, linux-mm

On Wed, Oct 17, 2012 at 1:14 PM, Kamezawa Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
>
> I think the idea is reasonable. I have a request.

Thanks for your comment.

>
> In current implemententation of vmscan.c, it seems sc.may_writepage, sc.may_swap
> are handled independent from gfp_mask.
>
> So, could you drop changes from this patch and handle these flags in another patch
> if these flags should be unset if ~GFP_IOFS ?

OK, I agree. In theory,  mm should make sure no I/O is involved if
memory allocation
users passes ~GFP_IOFS.

>
> I think try_to_free_page() path's sc.may_xxxx should be handled in the same way.

Yes, alloc_page_buffers() and dma_alloc_from_contiguous may drop into
the path, so gfp flag should be changed in try_to_free_page() too.


Thanks,
--
Ming Lei

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH v1 1/3] mm: teach mm by current context info to not do I/O during memory allocation
@ 2012-10-17 10:56       ` Ming Lei
  0 siblings, 0 replies; 23+ messages in thread
From: Ming Lei @ 2012-10-17 10:56 UTC (permalink / raw)
  To: Kamezawa Hiroyuki
  Cc: linux-kernel, Alan Stern, Oliver Neukum, Minchan Kim,
	Greg Kroah-Hartman, linux-usb, linux-pm, Jiri Kosina,
	Andrew Morton, Mel Gorman, Michal Hocko, Ingo Molnar,
	Peter Zijlstra, Rafael J. Wysocki, linux-mm

On Wed, Oct 17, 2012 at 1:14 PM, Kamezawa Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
>
> I think the idea is reasonable. I have a request.

Thanks for your comment.

>
> In current implemententation of vmscan.c, it seems sc.may_writepage, sc.may_swap
> are handled independent from gfp_mask.
>
> So, could you drop changes from this patch and handle these flags in another patch
> if these flags should be unset if ~GFP_IOFS ?

OK, I agree. In theory,  mm should make sure no I/O is involved if
memory allocation
users passes ~GFP_IOFS.

>
> I think try_to_free_page() path's sc.may_xxxx should be handled in the same way.

Yes, alloc_page_buffers() and dma_alloc_from_contiguous may drop into
the path, so gfp flag should be changed in try_to_free_page() too.


Thanks,
--
Ming Lei

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH v1 2/3] PM / Runtime: force memory allocation with no I/O during runtime_resume callbcack
  2012-10-17  5:43   ` Rafael J. Wysocki
@ 2012-10-17 11:07     ` Ming Lei
  2012-10-18 23:16       ` Rafael J. Wysocki
  0 siblings, 1 reply; 23+ messages in thread
From: Ming Lei @ 2012-10-17 11:07 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-kernel, Alan Stern, Oliver Neukum, Minchan Kim,
	Greg Kroah-Hartman, linux-usb, linux-pm

On Wed, Oct 17, 2012 at 1:43 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
>
> This appears to be a bit too heavy handed.  First of all, it seems to affect
> all memory allocations going in parallel with the resume callback.  Second,

No, the flag is per task, only memory allocation inside resume callback
is effected.

> it affects all resume callbacks, not only those where the problem really

We can do it only on block device, block device's ancestor and network
devices(iSCSI case), but that may introduce policy into PM core or add
one flag of memalloc_noio_resume into 'dev_pm_info', could you agree
on it?


Thanks,
--
Ming Lei

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH v1 1/3] mm: teach mm by current context info to not do I/O during memory allocation
  2012-10-17  1:54       ` Ming Lei
@ 2012-10-17 23:54         ` Andrew Morton
  -1 siblings, 0 replies; 23+ messages in thread
From: Andrew Morton @ 2012-10-17 23:54 UTC (permalink / raw)
  To: Ming Lei
  Cc: linux-kernel, Alan Stern, Oliver Neukum, Minchan Kim,
	Greg Kroah-Hartman, linux-usb, linux-pm, Jiri Kosina, Mel Gorman,
	KAMEZAWA Hiroyuki, Michal Hocko, Ingo Molnar, Peter Zijlstra,
	Rafael J. Wysocki, linux-mm

On Wed, 17 Oct 2012 09:54:09 +0800
Ming Lei <ming.lei@canonical.com> wrote:

> On Wed, Oct 17, 2012 at 4:19 AM, Andrew Morton
> <akpm@linux-foundation.org> wrote:
> >
> > The patch seems reasonable to me.  I'd like to see some examples of
> > these resume-time callsite which are performing the GFP_KERNEL
> > allocations, please.  You have found some kernel bugs, so those should
> > be fully described.
> 
> There are two examples on 2/3 and 3/3 of the patchset, see below link:
> 
>         http://marc.info/?l=linux-kernel&m=135040325717213&w=2
>         http://marc.info/?l=linux-kernel&m=135040327317222&w=2
> 
> Sorry for not Cc them to linux-mm because I am afraid of making noise
> in mm list.

Don't worry about mailing list noise ;)

> >
> > This is just awful.  Why oh why do we write code in macros when we have
> > a nice C compiler?
> 
> The two helpers are following style of local_irq_save() and
> local_irq_restore(), so that people can use them easily, that is
> why I define them as macro instead of inline.

local_irq_save() and local_irq_restore() were mistakes :( It's silly to
write what appears to be a C function and then have it operate like
Pascal (warning: I last wrote some Pascal in 66 B.C.).

> >
> > These can all be done as nice, clean, type-safe, documented C
> > functions.  And if they can be done that way, they *should* be done
> > that way!
> >
> > And I suggest that a better name for memalloc_noio_save() is
> > memalloc_noio_set().  So this:
> 
> IMO, renaming as memalloc_noio_set() might not be better than _save
> because the _set name doesn't indicate that the flag should be stored first.

You could add __must_check to the function definition to ensure that
all callers save its return value.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH v1 1/3] mm: teach mm by current context info to not do I/O during memory allocation
@ 2012-10-17 23:54         ` Andrew Morton
  0 siblings, 0 replies; 23+ messages in thread
From: Andrew Morton @ 2012-10-17 23:54 UTC (permalink / raw)
  To: Ming Lei
  Cc: linux-kernel, Alan Stern, Oliver Neukum, Minchan Kim,
	Greg Kroah-Hartman, linux-usb, linux-pm, Jiri Kosina, Mel Gorman,
	KAMEZAWA Hiroyuki, Michal Hocko, Ingo Molnar, Peter Zijlstra,
	Rafael J. Wysocki, linux-mm

On Wed, 17 Oct 2012 09:54:09 +0800
Ming Lei <ming.lei@canonical.com> wrote:

> On Wed, Oct 17, 2012 at 4:19 AM, Andrew Morton
> <akpm@linux-foundation.org> wrote:
> >
> > The patch seems reasonable to me.  I'd like to see some examples of
> > these resume-time callsite which are performing the GFP_KERNEL
> > allocations, please.  You have found some kernel bugs, so those should
> > be fully described.
> 
> There are two examples on 2/3 and 3/3 of the patchset, see below link:
> 
>         http://marc.info/?l=linux-kernel&m=135040325717213&w=2
>         http://marc.info/?l=linux-kernel&m=135040327317222&w=2
> 
> Sorry for not Cc them to linux-mm because I am afraid of making noise
> in mm list.

Don't worry about mailing list noise ;)

> >
> > This is just awful.  Why oh why do we write code in macros when we have
> > a nice C compiler?
> 
> The two helpers are following style of local_irq_save() and
> local_irq_restore(), so that people can use them easily, that is
> why I define them as macro instead of inline.

local_irq_save() and local_irq_restore() were mistakes :( It's silly to
write what appears to be a C function and then have it operate like
Pascal (warning: I last wrote some Pascal in 66 B.C.).

> >
> > These can all be done as nice, clean, type-safe, documented C
> > functions.  And if they can be done that way, they *should* be done
> > that way!
> >
> > And I suggest that a better name for memalloc_noio_save() is
> > memalloc_noio_set().  So this:
> 
> IMO, renaming as memalloc_noio_set() might not be better than _save
> because the _set name doesn't indicate that the flag should be stored first.

You could add __must_check to the function definition to ensure that
all callers save its return value.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH v1 2/3] PM / Runtime: force memory allocation with no I/O during runtime_resume callbcack
  2012-10-17 11:07     ` Ming Lei
@ 2012-10-18 23:16       ` Rafael J. Wysocki
  2012-10-19  1:41         ` Ming Lei
  0 siblings, 1 reply; 23+ messages in thread
From: Rafael J. Wysocki @ 2012-10-18 23:16 UTC (permalink / raw)
  To: Ming Lei
  Cc: linux-kernel, Alan Stern, Oliver Neukum, Minchan Kim,
	Greg Kroah-Hartman, linux-usb, linux-pm

On Wednesday 17 of October 2012 19:07:25 Ming Lei wrote:
> On Wed, Oct 17, 2012 at 1:43 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> >
> > This appears to be a bit too heavy handed.  First of all, it seems to affect
> > all memory allocations going in parallel with the resume callback.  Second,
> 
> No, the flag is per task, only memory allocation inside resume callback
> is effected.

OK

> > it affects all resume callbacks, not only those where the problem really
> 
> We can do it only on block device, block device's ancestor and network
> devices(iSCSI case), but that may introduce policy into PM core or add
> one flag of memalloc_noio_resume into 'dev_pm_info', could you agree
> on it?

Well, the question is how many runtime resume callbacks actually allocate
memory.  If they are not too many, we can just flag all of them.  Otherwise,
adding a flag may be a better approach.  I'm not sure ATM.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH v1 2/3] PM / Runtime: force memory allocation with no I/O during runtime_resume callbcack
  2012-10-18 23:16       ` Rafael J. Wysocki
@ 2012-10-19  1:41         ` Ming Lei
  0 siblings, 0 replies; 23+ messages in thread
From: Ming Lei @ 2012-10-19  1:41 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-kernel, Alan Stern, Oliver Neukum, Minchan Kim,
	Greg Kroah-Hartman, linux-usb, linux-pm

On Fri, Oct 19, 2012 at 7:16 AM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
>
> Well, the question is how many runtime resume callbacks actually allocate
> memory.  If they are not too many, we can just flag all of them.  Otherwise,

At least, almost all pci devices driver in some platform(acpi) and all usb
devices driver allocate memory with GFP_KERNEL which is done in
subsystem, see below:

       http://marc.info/?l=linux-kernel&m=135044522501229&w=2

I also found that many usb interface drivers(usblp, uvc, gspca, most of
dvb-usb-v2 media drivers, cpia2, az6007, ....) will do it too.

It is not good to convert all these GFP_KERNEL into GFP_NOIO
because the function doing that will be called in many other contexts.

That is just what I have found.  Unfortunately, this allocation can only be
found by human being now, maybe there are many not found since
any function in the resume path(call tree) may allocate memory with
GFP_KERNEL.

In fact, memalloc_noio() can convert some of current static GFP_NOIO
allocation into GFP_KERNEL in other contexts, at least almost all
GFP_NOIO in USB subsystem can be converted into GFP_KERNEL
after applying the approach and make allocation with GFP_IO
only happen in runtime resume/bus reset/block I/O transfer contexts
generally.

> adding a flag may be a better approach.  I'm not sure ATM.

OK, I will prepare -v2 with the flag approach for review.


Thanks,
--
Ming Lei

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH v1 1/3] mm: teach mm by current context info to not do I/O during memory allocation
  2012-10-17 23:54         ` Andrew Morton
@ 2012-10-19  3:52           ` Ming Lei
  -1 siblings, 0 replies; 23+ messages in thread
From: Ming Lei @ 2012-10-19  3:52 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, Alan Stern, Oliver Neukum, Minchan Kim,
	Greg Kroah-Hartman, linux-usb, linux-pm, Jiri Kosina, Mel Gorman,
	KAMEZAWA Hiroyuki, Michal Hocko, Ingo Molnar, Peter Zijlstra,
	Rafael J. Wysocki, linux-mm

On Thu, Oct 18, 2012 at 7:54 AM, Andrew Morton
<akpm@linux-foundation.org> wrote:
>
> local_irq_save() and local_irq_restore() were mistakes :( It's silly to
> write what appears to be a C function and then have it operate like
> Pascal (warning: I last wrote some Pascal in 66 B.C.).

Considered that spin_lock_irqsave/spin_unlock_irqrestore also follow
the style, kernel guys have been accustomed to the usage, I am
inclined to keep that as macro, :-)

>> IMO, renaming as memalloc_noio_set() might not be better than _save
>> because the _set name doesn't indicate that the flag should be stored first.
>
> You could add __must_check to the function definition to ensure that
> all callers save its return value.

Yes, we can do that, but the function name is not better than _save
from readability.

Thanks,
--
Ming Lei

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH v1 1/3] mm: teach mm by current context info to not do I/O during memory allocation
@ 2012-10-19  3:52           ` Ming Lei
  0 siblings, 0 replies; 23+ messages in thread
From: Ming Lei @ 2012-10-19  3:52 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, Alan Stern, Oliver Neukum, Minchan Kim,
	Greg Kroah-Hartman, linux-usb, linux-pm, Jiri Kosina, Mel Gorman,
	KAMEZAWA Hiroyuki, Michal Hocko, Ingo Molnar, Peter Zijlstra,
	Rafael J. Wysocki, linux-mm

On Thu, Oct 18, 2012 at 7:54 AM, Andrew Morton
<akpm@linux-foundation.org> wrote:
>
> local_irq_save() and local_irq_restore() were mistakes :( It's silly to
> write what appears to be a C function and then have it operate like
> Pascal (warning: I last wrote some Pascal in 66 B.C.).

Considered that spin_lock_irqsave/spin_unlock_irqrestore also follow
the style, kernel guys have been accustomed to the usage, I am
inclined to keep that as macro, :-)

>> IMO, renaming as memalloc_noio_set() might not be better than _save
>> because the _set name doesn't indicate that the flag should be stored first.
>
> You could add __must_check to the function definition to ensure that
> all callers save its return value.

Yes, we can do that, but the function name is not better than _save
from readability.

Thanks,
--
Ming Lei

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2012-10-19  3:52 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-10-16 15:59 [RFC PATCH v1 0/3] mm/PM/USB: memory allocation with no io in need Ming Lei
2012-10-16 15:59 ` [RFC PATCH v1 1/3] mm: teach mm by current context info to not do I/O during memory allocation Ming Lei
2012-10-16 15:59   ` Ming Lei
2012-10-16 20:19   ` Andrew Morton
2012-10-16 20:19     ` Andrew Morton
2012-10-17  1:54     ` Ming Lei
2012-10-17  1:54       ` Ming Lei
2012-10-17 23:54       ` Andrew Morton
2012-10-17 23:54         ` Andrew Morton
2012-10-19  3:52         ` Ming Lei
2012-10-19  3:52           ` Ming Lei
2012-10-17  3:40     ` Ming Lei
2012-10-17  3:40       ` Ming Lei
2012-10-17  5:14   ` Kamezawa Hiroyuki
2012-10-17  5:14     ` Kamezawa Hiroyuki
2012-10-17 10:56     ` Ming Lei
2012-10-17 10:56       ` Ming Lei
2012-10-16 15:59 ` [RFC PATCH v1 2/3] PM / Runtime: force memory allocation with no I/O during runtime_resume callbcack Ming Lei
2012-10-17  5:43   ` Rafael J. Wysocki
2012-10-17 11:07     ` Ming Lei
2012-10-18 23:16       ` Rafael J. Wysocki
2012-10-19  1:41         ` Ming Lei
2012-10-16 15:59 ` [RFC PATCH v1 3/3] USB: forbid memory allocation with I/O during bus reset Ming Lei

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.