All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/7] Kill PF_MEMALLOC abuse
@ 2009-11-17  7:16 ` KOSAKI Motohiro
  0 siblings, 0 replies; 108+ messages in thread
From: KOSAKI Motohiro @ 2009-11-17  7:16 UTC (permalink / raw)
  To: linux-mm, LKML, Andrew Morton; +Cc: kosaki.motohiro


PF_MEMALLOC have following effects.
  (1) Ignore zone watermark
  (2) Don't call reclaim although allocation failure, instead return ENOMEM
  (3) Don't invoke OOM Killer
  (4) Don't retry internally in page alloc

Some subsystem paid attention (1) only, and start to use PF_MEMALLOC abuse.
But, the fact is, PF_MEMALLOC is the promise of "I have lots freeable memory.
if I allocate few memory, I can return more much meory to the system!".
Non MM subsystem must not use PF_MEMALLOC. Memory reclaim
need few memory, anyone must not prevent it. Otherwise the system cause
mysterious hang-up and/or OOM Killer invokation.

if many subsystem will be able to use emergency memory without any
usage rule, it isn't for emergency. it can become empty easily.

Plus, characteristics (2)-(4) mean PF_MEMALLOC don't fit to general
high priority memory allocation.

Thus, We kill all PF_MEMALLOC usage in no MM subsystem.




^ permalink raw reply	[flat|nested] 108+ messages in thread

* [PATCH 0/7] Kill PF_MEMALLOC abuse
@ 2009-11-17  7:16 ` KOSAKI Motohiro
  0 siblings, 0 replies; 108+ messages in thread
From: KOSAKI Motohiro @ 2009-11-17  7:16 UTC (permalink / raw)
  To: linux-mm, LKML, Andrew Morton; +Cc: kosaki.motohiro


PF_MEMALLOC have following effects.
  (1) Ignore zone watermark
  (2) Don't call reclaim although allocation failure, instead return ENOMEM
  (3) Don't invoke OOM Killer
  (4) Don't retry internally in page alloc

Some subsystem paid attention (1) only, and start to use PF_MEMALLOC abuse.
But, the fact is, PF_MEMALLOC is the promise of "I have lots freeable memory.
if I allocate few memory, I can return more much meory to the system!".
Non MM subsystem must not use PF_MEMALLOC. Memory reclaim
need few memory, anyone must not prevent it. Otherwise the system cause
mysterious hang-up and/or OOM Killer invokation.

if many subsystem will be able to use emergency memory without any
usage rule, it isn't for emergency. it can become empty easily.

Plus, characteristics (2)-(4) mean PF_MEMALLOC don't fit to general
high priority memory allocation.

Thus, We kill all PF_MEMALLOC usage in no MM subsystem.



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* [PATCH 1/7] dm: use __GFP_HIGH instead PF_MEMALLOC
  2009-11-17  7:16 ` KOSAKI Motohiro
@ 2009-11-17  7:17   ` KOSAKI Motohiro
  -1 siblings, 0 replies; 108+ messages in thread
From: KOSAKI Motohiro @ 2009-11-17  7:17 UTC (permalink / raw)
  To: LKML
  Cc: kosaki.motohiro, linux-mm, Andrew Morton, Alasdair G Kergon, dm-devel

Non MM subsystem must not use PF_MEMALLOC. Memory reclaim need few
memory, anyone must not prevent it. Otherwise the system cause
mysterious hang-up and/or OOM Killer invokation.

Cc: Alasdair G Kergon <agk@redhat.com>
Cc: dm-devel@redhat.com
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 drivers/md/dm-ioctl.c |   18 +++++++-----------
 1 files changed, 7 insertions(+), 11 deletions(-)

diff --git a/drivers/md/dm-ioctl.c b/drivers/md/dm-ioctl.c
index a679429..4d24b0a 100644
--- a/drivers/md/dm-ioctl.c
+++ b/drivers/md/dm-ioctl.c
@@ -1396,7 +1396,13 @@ static int copy_params(struct dm_ioctl __user *user, struct dm_ioctl **param)
 	if (tmp.data_size < (sizeof(tmp) - sizeof(tmp.data)))
 		return -EINVAL;
 
-	dmi = vmalloc(tmp.data_size);
+
+	/*
+	 * We use __vmalloc(__GFP_HIGH) instead vmalloc() because trying to
+	 * avoid low memory issues when a device is suspended.
+	 */
+	dmi = __vmalloc(tmp.data_size, GFP_KERNEL | __GFP_HIGHMEM | __GFP_HIGH,
+			PAGE_KERNEL);
 	if (!dmi)
 		return -ENOMEM;
 
@@ -1473,20 +1479,10 @@ static int ctl_ioctl(uint command, struct dm_ioctl __user *user)
 		DMWARN("dm_ctl_ioctl: unknown command 0x%x", command);
 		return -ENOTTY;
 	}
-
-	/*
-	 * Trying to avoid low memory issues when a device is
-	 * suspended.
-	 */
-	current->flags |= PF_MEMALLOC;
-
 	/*
 	 * Copy the parameters into kernel space.
 	 */
 	r = copy_params(user, &param);
-
-	current->flags &= ~PF_MEMALLOC;
-
 	if (r)
 		return r;
 
-- 
1.6.2.5




^ permalink raw reply related	[flat|nested] 108+ messages in thread

* [PATCH 1/7] dm: use __GFP_HIGH instead PF_MEMALLOC
@ 2009-11-17  7:17   ` KOSAKI Motohiro
  0 siblings, 0 replies; 108+ messages in thread
From: KOSAKI Motohiro @ 2009-11-17  7:17 UTC (permalink / raw)
  To: LKML
  Cc: kosaki.motohiro, linux-mm, Andrew Morton, Alasdair G Kergon, dm-devel

Non MM subsystem must not use PF_MEMALLOC. Memory reclaim need few
memory, anyone must not prevent it. Otherwise the system cause
mysterious hang-up and/or OOM Killer invokation.

Cc: Alasdair G Kergon <agk@redhat.com>
Cc: dm-devel@redhat.com
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 drivers/md/dm-ioctl.c |   18 +++++++-----------
 1 files changed, 7 insertions(+), 11 deletions(-)

diff --git a/drivers/md/dm-ioctl.c b/drivers/md/dm-ioctl.c
index a679429..4d24b0a 100644
--- a/drivers/md/dm-ioctl.c
+++ b/drivers/md/dm-ioctl.c
@@ -1396,7 +1396,13 @@ static int copy_params(struct dm_ioctl __user *user, struct dm_ioctl **param)
 	if (tmp.data_size < (sizeof(tmp) - sizeof(tmp.data)))
 		return -EINVAL;
 
-	dmi = vmalloc(tmp.data_size);
+
+	/*
+	 * We use __vmalloc(__GFP_HIGH) instead vmalloc() because trying to
+	 * avoid low memory issues when a device is suspended.
+	 */
+	dmi = __vmalloc(tmp.data_size, GFP_KERNEL | __GFP_HIGHMEM | __GFP_HIGH,
+			PAGE_KERNEL);
 	if (!dmi)
 		return -ENOMEM;
 
@@ -1473,20 +1479,10 @@ static int ctl_ioctl(uint command, struct dm_ioctl __user *user)
 		DMWARN("dm_ctl_ioctl: unknown command 0x%x", command);
 		return -ENOTTY;
 	}
-
-	/*
-	 * Trying to avoid low memory issues when a device is
-	 * suspended.
-	 */
-	current->flags |= PF_MEMALLOC;
-
 	/*
 	 * Copy the parameters into kernel space.
 	 */
 	r = copy_params(user, &param);
-
-	current->flags &= ~PF_MEMALLOC;
-
 	if (r)
 		return r;
 
-- 
1.6.2.5



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 108+ messages in thread

* [PATCH 2/7] mmc: Don't use PF_MEMALLOC
  2009-11-17  7:16 ` KOSAKI Motohiro
@ 2009-11-17  7:17   ` KOSAKI Motohiro
  -1 siblings, 0 replies; 108+ messages in thread
From: KOSAKI Motohiro @ 2009-11-17  7:17 UTC (permalink / raw)
  To: LKML; +Cc: kosaki.motohiro, linux-mm, Andrew Morton, linux-mmc

Non MM subsystem must not use PF_MEMALLOC. Memory reclaim need few
memory, anyone must not prevent it. Otherwise the system cause
mysterious hang-up and/or OOM Killer invokation.

Cc: linux-mmc@vger.kernel.org
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 drivers/mmc/card/queue.c |    2 --
 1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/drivers/mmc/card/queue.c b/drivers/mmc/card/queue.c
index 49e5823..5deb996 100644
--- a/drivers/mmc/card/queue.c
+++ b/drivers/mmc/card/queue.c
@@ -46,8 +46,6 @@ static int mmc_queue_thread(void *d)
 	struct mmc_queue *mq = d;
 	struct request_queue *q = mq->queue;
 
-	current->flags |= PF_MEMALLOC;
-
 	down(&mq->thread_sem);
 	do {
 		struct request *req = NULL;
-- 
1.6.2.5




^ permalink raw reply related	[flat|nested] 108+ messages in thread

* [PATCH 2/7] mmc: Don't use PF_MEMALLOC
@ 2009-11-17  7:17   ` KOSAKI Motohiro
  0 siblings, 0 replies; 108+ messages in thread
From: KOSAKI Motohiro @ 2009-11-17  7:17 UTC (permalink / raw)
  To: LKML; +Cc: kosaki.motohiro, linux-mm, Andrew Morton, linux-mmc

Non MM subsystem must not use PF_MEMALLOC. Memory reclaim need few
memory, anyone must not prevent it. Otherwise the system cause
mysterious hang-up and/or OOM Killer invokation.

Cc: linux-mmc@vger.kernel.org
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 drivers/mmc/card/queue.c |    2 --
 1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/drivers/mmc/card/queue.c b/drivers/mmc/card/queue.c
index 49e5823..5deb996 100644
--- a/drivers/mmc/card/queue.c
+++ b/drivers/mmc/card/queue.c
@@ -46,8 +46,6 @@ static int mmc_queue_thread(void *d)
 	struct mmc_queue *mq = d;
 	struct request_queue *q = mq->queue;
 
-	current->flags |= PF_MEMALLOC;
-
 	down(&mq->thread_sem);
 	do {
 		struct request *req = NULL;
-- 
1.6.2.5



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 108+ messages in thread

* [PATCH 3/7] mtd: Don't use PF_MEMALLOC
  2009-11-17  7:16 ` KOSAKI Motohiro
  (?)
@ 2009-11-17  7:18   ` KOSAKI Motohiro
  -1 siblings, 0 replies; 108+ messages in thread
From: KOSAKI Motohiro @ 2009-11-17  7:18 UTC (permalink / raw)
  To: LKML; +Cc: kosaki.motohiro, linux-mm, Andrew Morton, David Woodhouse, linux-mtd


Non MM subsystem must not use PF_MEMALLOC. Memory reclaim need few
memory, anyone must not prevent it. Otherwise the system cause
mysterious hang-up and/or OOM Killer invokation.

Cc: David Woodhouse <dwmw2@infradead.org>
Cc: linux-mtd@lists.infradead.org
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 drivers/mtd/mtd_blkdevs.c |    3 ---
 1 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/drivers/mtd/mtd_blkdevs.c b/drivers/mtd/mtd_blkdevs.c
index 8ca17a3..7be5ac3 100644
--- a/drivers/mtd/mtd_blkdevs.c
+++ b/drivers/mtd/mtd_blkdevs.c
@@ -82,9 +82,6 @@ static int mtd_blktrans_thread(void *arg)
 	struct request_queue *rq = tr->blkcore_priv->rq;
 	struct request *req = NULL;
 
-	/* we might get involved when memory gets low, so use PF_MEMALLOC */
-	current->flags |= PF_MEMALLOC;
-
 	spin_lock_irq(rq->queue_lock);
 
 	while (!kthread_should_stop()) {
-- 
1.6.2.5




^ permalink raw reply related	[flat|nested] 108+ messages in thread

* [PATCH 3/7] mtd: Don't use PF_MEMALLOC
@ 2009-11-17  7:18   ` KOSAKI Motohiro
  0 siblings, 0 replies; 108+ messages in thread
From: KOSAKI Motohiro @ 2009-11-17  7:18 UTC (permalink / raw)
  To: LKML; +Cc: kosaki.motohiro, linux-mm, Andrew Morton, David Woodhouse, linux-mtd


Non MM subsystem must not use PF_MEMALLOC. Memory reclaim need few
memory, anyone must not prevent it. Otherwise the system cause
mysterious hang-up and/or OOM Killer invokation.

Cc: David Woodhouse <dwmw2@infradead.org>
Cc: linux-mtd@lists.infradead.org
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 drivers/mtd/mtd_blkdevs.c |    3 ---
 1 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/drivers/mtd/mtd_blkdevs.c b/drivers/mtd/mtd_blkdevs.c
index 8ca17a3..7be5ac3 100644
--- a/drivers/mtd/mtd_blkdevs.c
+++ b/drivers/mtd/mtd_blkdevs.c
@@ -82,9 +82,6 @@ static int mtd_blktrans_thread(void *arg)
 	struct request_queue *rq = tr->blkcore_priv->rq;
 	struct request *req = NULL;
 
-	/* we might get involved when memory gets low, so use PF_MEMALLOC */
-	current->flags |= PF_MEMALLOC;
-
 	spin_lock_irq(rq->queue_lock);
 
 	while (!kthread_should_stop()) {
-- 
1.6.2.5



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 108+ messages in thread

* [PATCH 3/7] mtd: Don't use PF_MEMALLOC
@ 2009-11-17  7:18   ` KOSAKI Motohiro
  0 siblings, 0 replies; 108+ messages in thread
From: KOSAKI Motohiro @ 2009-11-17  7:18 UTC (permalink / raw)
  To: LKML; +Cc: linux-mtd, linux-mm, Andrew Morton, David Woodhouse, kosaki.motohiro


Non MM subsystem must not use PF_MEMALLOC. Memory reclaim need few
memory, anyone must not prevent it. Otherwise the system cause
mysterious hang-up and/or OOM Killer invokation.

Cc: David Woodhouse <dwmw2@infradead.org>
Cc: linux-mtd@lists.infradead.org
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 drivers/mtd/mtd_blkdevs.c |    3 ---
 1 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/drivers/mtd/mtd_blkdevs.c b/drivers/mtd/mtd_blkdevs.c
index 8ca17a3..7be5ac3 100644
--- a/drivers/mtd/mtd_blkdevs.c
+++ b/drivers/mtd/mtd_blkdevs.c
@@ -82,9 +82,6 @@ static int mtd_blktrans_thread(void *arg)
 	struct request_queue *rq = tr->blkcore_priv->rq;
 	struct request *req = NULL;
 
-	/* we might get involved when memory gets low, so use PF_MEMALLOC */
-	current->flags |= PF_MEMALLOC;
-
 	spin_lock_irq(rq->queue_lock);
 
 	while (!kthread_should_stop()) {
-- 
1.6.2.5

^ permalink raw reply related	[flat|nested] 108+ messages in thread

* [PATCH 4/7] nandsim: Don't use PF_MEMALLOC
  2009-11-17  7:16 ` KOSAKI Motohiro
  (?)
@ 2009-11-17  7:19   ` KOSAKI Motohiro
  -1 siblings, 0 replies; 108+ messages in thread
From: KOSAKI Motohiro @ 2009-11-17  7:19 UTC (permalink / raw)
  To: LKML
  Cc: kosaki.motohiro, linux-mm, Andrew Morton, David Woodhouse,
	Artem Bityutskiy, linux-mtd


Non MM subsystem must not use PF_MEMALLOC. Memory reclaim need few
memory, anyone must not prevent it. Otherwise the system cause
mysterious hang-up and/or OOM Killer invokation.

Cc: David Woodhouse <David.Woodhouse@intel.com>
Cc: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Cc: linux-mtd@lists.infradead.org
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 drivers/mtd/nand/nandsim.c |   22 ++--------------------
 1 files changed, 2 insertions(+), 20 deletions(-)

diff --git a/drivers/mtd/nand/nandsim.c b/drivers/mtd/nand/nandsim.c
index cd0711b..97a8bbb 100644
--- a/drivers/mtd/nand/nandsim.c
+++ b/drivers/mtd/nand/nandsim.c
@@ -1322,34 +1322,18 @@ static int get_pages(struct nandsim *ns, struct file *file, size_t count, loff_t
 	return 0;
 }
 
-static int set_memalloc(void)
-{
-	if (current->flags & PF_MEMALLOC)
-		return 0;
-	current->flags |= PF_MEMALLOC;
-	return 1;
-}
-
-static void clear_memalloc(int memalloc)
-{
-	if (memalloc)
-		current->flags &= ~PF_MEMALLOC;
-}
-
 static ssize_t read_file(struct nandsim *ns, struct file *file, void *buf, size_t count, loff_t *pos)
 {
 	mm_segment_t old_fs;
 	ssize_t tx;
-	int err, memalloc;
+	int err;
 
 	err = get_pages(ns, file, count, *pos);
 	if (err)
 		return err;
 	old_fs = get_fs();
 	set_fs(get_ds());
-	memalloc = set_memalloc();
 	tx = vfs_read(file, (char __user *)buf, count, pos);
-	clear_memalloc(memalloc);
 	set_fs(old_fs);
 	put_pages(ns);
 	return tx;
@@ -1359,16 +1343,14 @@ static ssize_t write_file(struct nandsim *ns, struct file *file, void *buf, size
 {
 	mm_segment_t old_fs;
 	ssize_t tx;
-	int err, memalloc;
+	int err;
 
 	err = get_pages(ns, file, count, *pos);
 	if (err)
 		return err;
 	old_fs = get_fs();
 	set_fs(get_ds());
-	memalloc = set_memalloc();
 	tx = vfs_write(file, (char __user *)buf, count, pos);
-	clear_memalloc(memalloc);
 	set_fs(old_fs);
 	put_pages(ns);
 	return tx;
-- 
1.6.2.5




^ permalink raw reply related	[flat|nested] 108+ messages in thread

* [PATCH 4/7] nandsim: Don't use PF_MEMALLOC
@ 2009-11-17  7:19   ` KOSAKI Motohiro
  0 siblings, 0 replies; 108+ messages in thread
From: KOSAKI Motohiro @ 2009-11-17  7:19 UTC (permalink / raw)
  To: LKML
  Cc: kosaki.motohiro, linux-mm, Andrew Morton, David Woodhouse,
	Artem Bityutskiy, linux-mtd


Non MM subsystem must not use PF_MEMALLOC. Memory reclaim need few
memory, anyone must not prevent it. Otherwise the system cause
mysterious hang-up and/or OOM Killer invokation.

Cc: David Woodhouse <David.Woodhouse@intel.com>
Cc: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Cc: linux-mtd@lists.infradead.org
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 drivers/mtd/nand/nandsim.c |   22 ++--------------------
 1 files changed, 2 insertions(+), 20 deletions(-)

diff --git a/drivers/mtd/nand/nandsim.c b/drivers/mtd/nand/nandsim.c
index cd0711b..97a8bbb 100644
--- a/drivers/mtd/nand/nandsim.c
+++ b/drivers/mtd/nand/nandsim.c
@@ -1322,34 +1322,18 @@ static int get_pages(struct nandsim *ns, struct file *file, size_t count, loff_t
 	return 0;
 }
 
-static int set_memalloc(void)
-{
-	if (current->flags & PF_MEMALLOC)
-		return 0;
-	current->flags |= PF_MEMALLOC;
-	return 1;
-}
-
-static void clear_memalloc(int memalloc)
-{
-	if (memalloc)
-		current->flags &= ~PF_MEMALLOC;
-}
-
 static ssize_t read_file(struct nandsim *ns, struct file *file, void *buf, size_t count, loff_t *pos)
 {
 	mm_segment_t old_fs;
 	ssize_t tx;
-	int err, memalloc;
+	int err;
 
 	err = get_pages(ns, file, count, *pos);
 	if (err)
 		return err;
 	old_fs = get_fs();
 	set_fs(get_ds());
-	memalloc = set_memalloc();
 	tx = vfs_read(file, (char __user *)buf, count, pos);
-	clear_memalloc(memalloc);
 	set_fs(old_fs);
 	put_pages(ns);
 	return tx;
@@ -1359,16 +1343,14 @@ static ssize_t write_file(struct nandsim *ns, struct file *file, void *buf, size
 {
 	mm_segment_t old_fs;
 	ssize_t tx;
-	int err, memalloc;
+	int err;
 
 	err = get_pages(ns, file, count, *pos);
 	if (err)
 		return err;
 	old_fs = get_fs();
 	set_fs(get_ds());
-	memalloc = set_memalloc();
 	tx = vfs_write(file, (char __user *)buf, count, pos);
-	clear_memalloc(memalloc);
 	set_fs(old_fs);
 	put_pages(ns);
 	return tx;
-- 
1.6.2.5



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 108+ messages in thread

* [PATCH 4/7] nandsim: Don't use PF_MEMALLOC
@ 2009-11-17  7:19   ` KOSAKI Motohiro
  0 siblings, 0 replies; 108+ messages in thread
From: KOSAKI Motohiro @ 2009-11-17  7:19 UTC (permalink / raw)
  To: LKML
  Cc: Artem Bityutskiy, David Woodhouse, linux-mm, linux-mtd,
	kosaki.motohiro, Andrew Morton


Non MM subsystem must not use PF_MEMALLOC. Memory reclaim need few
memory, anyone must not prevent it. Otherwise the system cause
mysterious hang-up and/or OOM Killer invokation.

Cc: David Woodhouse <David.Woodhouse@intel.com>
Cc: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Cc: linux-mtd@lists.infradead.org
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 drivers/mtd/nand/nandsim.c |   22 ++--------------------
 1 files changed, 2 insertions(+), 20 deletions(-)

diff --git a/drivers/mtd/nand/nandsim.c b/drivers/mtd/nand/nandsim.c
index cd0711b..97a8bbb 100644
--- a/drivers/mtd/nand/nandsim.c
+++ b/drivers/mtd/nand/nandsim.c
@@ -1322,34 +1322,18 @@ static int get_pages(struct nandsim *ns, struct file *file, size_t count, loff_t
 	return 0;
 }
 
-static int set_memalloc(void)
-{
-	if (current->flags & PF_MEMALLOC)
-		return 0;
-	current->flags |= PF_MEMALLOC;
-	return 1;
-}
-
-static void clear_memalloc(int memalloc)
-{
-	if (memalloc)
-		current->flags &= ~PF_MEMALLOC;
-}
-
 static ssize_t read_file(struct nandsim *ns, struct file *file, void *buf, size_t count, loff_t *pos)
 {
 	mm_segment_t old_fs;
 	ssize_t tx;
-	int err, memalloc;
+	int err;
 
 	err = get_pages(ns, file, count, *pos);
 	if (err)
 		return err;
 	old_fs = get_fs();
 	set_fs(get_ds());
-	memalloc = set_memalloc();
 	tx = vfs_read(file, (char __user *)buf, count, pos);
-	clear_memalloc(memalloc);
 	set_fs(old_fs);
 	put_pages(ns);
 	return tx;
@@ -1359,16 +1343,14 @@ static ssize_t write_file(struct nandsim *ns, struct file *file, void *buf, size
 {
 	mm_segment_t old_fs;
 	ssize_t tx;
-	int err, memalloc;
+	int err;
 
 	err = get_pages(ns, file, count, *pos);
 	if (err)
 		return err;
 	old_fs = get_fs();
 	set_fs(get_ds());
-	memalloc = set_memalloc();
 	tx = vfs_write(file, (char __user *)buf, count, pos);
-	clear_memalloc(memalloc);
 	set_fs(old_fs);
 	put_pages(ns);
 	return tx;
-- 
1.6.2.5

^ permalink raw reply related	[flat|nested] 108+ messages in thread

* [PATCH 5/7] Revert "Intel IOMMU: Avoid memory allocation failures in dma map api calls"
  2009-11-17  7:16 ` KOSAKI Motohiro
@ 2009-11-17  7:21   ` KOSAKI Motohiro
  -1 siblings, 0 replies; 108+ messages in thread
From: KOSAKI Motohiro @ 2009-11-17  7:21 UTC (permalink / raw)
  To: LKML
  Cc: kosaki.motohiro, linux-mm, Andrew Morton, Keshavamurthy Anil S,
	David Woodhouse, iommu


commit eb3fa7cb51 said Intel IOMMU

    Intel IOMMU driver needs memory during DMA map calls to setup its
    internal page tables and for other data structures.  As we all know
    that these DMA map calls are mostly called in the interrupt context
    or with the spinlock held by the upper level drivers(network/storage
    drivers), so in order to avoid any memory allocation failure due to
    low memory issues, this patch makes memory allocation by temporarily
    setting PF_MEMALLOC flags for the current task before making memory
    allocation calls.

    We evaluated mempools as a backup when kmem_cache_alloc() fails
    and found that mempools are really not useful here because
     1) We don't know for sure how much to reserve in advance
     2) And mempools are not useful for GFP_ATOMIC case (as we call
        memory alloc functions with GFP_ATOMIC)

    (akpm: point 2 is wrong...)

The above description doesn't justify to waste system emergency memory
at all. Non MM subsystem must not use PF_MEMALLOC. Memory reclaim need
few memory, anyone must not prevent it. Otherwise the system cause
mysterious hang-up and/or OOM Killer invokation.

Plus, akpm already pointed out what we should do.

Then, this patch revert it.

Cc: Keshavamurthy Anil S <anil.s.keshavamurthy@intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: iommu@lists.linux-foundation.org
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 drivers/pci/intel-iommu.c |   30 ++++--------------------------
 1 files changed, 4 insertions(+), 26 deletions(-)

diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
index 1840a05..17d6f1e 100644
--- a/drivers/pci/intel-iommu.c
+++ b/drivers/pci/intel-iommu.c
@@ -386,31 +386,9 @@ static struct kmem_cache *iommu_domain_cache;
 static struct kmem_cache *iommu_devinfo_cache;
 static struct kmem_cache *iommu_iova_cache;
 
-static inline void *iommu_kmem_cache_alloc(struct kmem_cache *cachep)
-{
-	unsigned int flags;
-	void *vaddr;
-
-	/* trying to avoid low memory issues */
-	flags = current->flags & PF_MEMALLOC;
-	current->flags |= PF_MEMALLOC;
-	vaddr = kmem_cache_alloc(cachep, GFP_ATOMIC);
-	current->flags &= (~PF_MEMALLOC | flags);
-	return vaddr;
-}
-
-
 static inline void *alloc_pgtable_page(void)
 {
-	unsigned int flags;
-	void *vaddr;
-
-	/* trying to avoid low memory issues */
-	flags = current->flags & PF_MEMALLOC;
-	current->flags |= PF_MEMALLOC;
-	vaddr = (void *)get_zeroed_page(GFP_ATOMIC);
-	current->flags &= (~PF_MEMALLOC | flags);
-	return vaddr;
+	return (void *)get_zeroed_page(GFP_ATOMIC);
 }
 
 static inline void free_pgtable_page(void *vaddr)
@@ -420,7 +398,7 @@ static inline void free_pgtable_page(void *vaddr)
 
 static inline void *alloc_domain_mem(void)
 {
-	return iommu_kmem_cache_alloc(iommu_domain_cache);
+	return kmem_cache_alloc(iommu_domain_cache, GFP_ATOMIC);
 }
 
 static void free_domain_mem(void *vaddr)
@@ -430,7 +408,7 @@ static void free_domain_mem(void *vaddr)
 
 static inline void * alloc_devinfo_mem(void)
 {
-	return iommu_kmem_cache_alloc(iommu_devinfo_cache);
+	return kmem_cache_alloc(iommu_devinfo_cache, GFP_ATOMIC);
 }
 
 static inline void free_devinfo_mem(void *vaddr)
@@ -440,7 +418,7 @@ static inline void free_devinfo_mem(void *vaddr)
 
 struct iova *alloc_iova_mem(void)
 {
-	return iommu_kmem_cache_alloc(iommu_iova_cache);
+	return kmem_cache_alloc(iommu_iova_cache, GFP_ATOMIC);
 }
 
 void free_iova_mem(struct iova *iova)
-- 
1.6.2.5




^ permalink raw reply related	[flat|nested] 108+ messages in thread

* [PATCH 5/7] Revert "Intel IOMMU: Avoid memory allocation failures in dma map api calls"
@ 2009-11-17  7:21   ` KOSAKI Motohiro
  0 siblings, 0 replies; 108+ messages in thread
From: KOSAKI Motohiro @ 2009-11-17  7:21 UTC (permalink / raw)
  To: LKML
  Cc: kosaki.motohiro, linux-mm, Andrew Morton, Keshavamurthy Anil S,
	David Woodhouse, iommu


commit eb3fa7cb51 said Intel IOMMU

    Intel IOMMU driver needs memory during DMA map calls to setup its
    internal page tables and for other data structures.  As we all know
    that these DMA map calls are mostly called in the interrupt context
    or with the spinlock held by the upper level drivers(network/storage
    drivers), so in order to avoid any memory allocation failure due to
    low memory issues, this patch makes memory allocation by temporarily
    setting PF_MEMALLOC flags for the current task before making memory
    allocation calls.

    We evaluated mempools as a backup when kmem_cache_alloc() fails
    and found that mempools are really not useful here because
     1) We don't know for sure how much to reserve in advance
     2) And mempools are not useful for GFP_ATOMIC case (as we call
        memory alloc functions with GFP_ATOMIC)

    (akpm: point 2 is wrong...)

The above description doesn't justify to waste system emergency memory
at all. Non MM subsystem must not use PF_MEMALLOC. Memory reclaim need
few memory, anyone must not prevent it. Otherwise the system cause
mysterious hang-up and/or OOM Killer invokation.

Plus, akpm already pointed out what we should do.

Then, this patch revert it.

Cc: Keshavamurthy Anil S <anil.s.keshavamurthy@intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: iommu@lists.linux-foundation.org
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 drivers/pci/intel-iommu.c |   30 ++++--------------------------
 1 files changed, 4 insertions(+), 26 deletions(-)

diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
index 1840a05..17d6f1e 100644
--- a/drivers/pci/intel-iommu.c
+++ b/drivers/pci/intel-iommu.c
@@ -386,31 +386,9 @@ static struct kmem_cache *iommu_domain_cache;
 static struct kmem_cache *iommu_devinfo_cache;
 static struct kmem_cache *iommu_iova_cache;
 
-static inline void *iommu_kmem_cache_alloc(struct kmem_cache *cachep)
-{
-	unsigned int flags;
-	void *vaddr;
-
-	/* trying to avoid low memory issues */
-	flags = current->flags & PF_MEMALLOC;
-	current->flags |= PF_MEMALLOC;
-	vaddr = kmem_cache_alloc(cachep, GFP_ATOMIC);
-	current->flags &= (~PF_MEMALLOC | flags);
-	return vaddr;
-}
-
-
 static inline void *alloc_pgtable_page(void)
 {
-	unsigned int flags;
-	void *vaddr;
-
-	/* trying to avoid low memory issues */
-	flags = current->flags & PF_MEMALLOC;
-	current->flags |= PF_MEMALLOC;
-	vaddr = (void *)get_zeroed_page(GFP_ATOMIC);
-	current->flags &= (~PF_MEMALLOC | flags);
-	return vaddr;
+	return (void *)get_zeroed_page(GFP_ATOMIC);
 }
 
 static inline void free_pgtable_page(void *vaddr)
@@ -420,7 +398,7 @@ static inline void free_pgtable_page(void *vaddr)
 
 static inline void *alloc_domain_mem(void)
 {
-	return iommu_kmem_cache_alloc(iommu_domain_cache);
+	return kmem_cache_alloc(iommu_domain_cache, GFP_ATOMIC);
 }
 
 static void free_domain_mem(void *vaddr)
@@ -430,7 +408,7 @@ static void free_domain_mem(void *vaddr)
 
 static inline void * alloc_devinfo_mem(void)
 {
-	return iommu_kmem_cache_alloc(iommu_devinfo_cache);
+	return kmem_cache_alloc(iommu_devinfo_cache, GFP_ATOMIC);
 }
 
 static inline void free_devinfo_mem(void *vaddr)
@@ -440,7 +418,7 @@ static inline void free_devinfo_mem(void *vaddr)
 
 struct iova *alloc_iova_mem(void)
 {
-	return iommu_kmem_cache_alloc(iommu_iova_cache);
+	return kmem_cache_alloc(iommu_iova_cache, GFP_ATOMIC);
 }
 
 void free_iova_mem(struct iova *iova)
-- 
1.6.2.5



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 108+ messages in thread

* [PATCH 6/7] cifs: Don't use PF_MEMALLOC
  2009-11-17  7:16 ` KOSAKI Motohiro
@ 2009-11-17  7:22   ` KOSAKI Motohiro
  -1 siblings, 0 replies; 108+ messages in thread
From: KOSAKI Motohiro @ 2009-11-17  7:22 UTC (permalink / raw)
  To: LKML
  Cc: kosaki.motohiro, linux-mm, Andrew Morton, Steve French,
	linux-cifs-client, samba-technical


Non MM subsystem must not use PF_MEMALLOC. Memory reclaim need few
memory, anyone must not prevent it. Otherwise the system cause
mysterious hang-up and/or OOM Killer invokation.

Cc: Steve French <sfrench@samba.org>
Cc: linux-cifs-client@lists.samba.org
Cc: samba-technical@lists.samba.org
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 fs/cifs/connect.c |    1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
index 63ea83f..f9b1553 100644
--- a/fs/cifs/connect.c
+++ b/fs/cifs/connect.c
@@ -337,7 +337,6 @@ cifs_demultiplex_thread(struct TCP_Server_Info *server)
 	bool isMultiRsp;
 	int reconnect;
 
-	current->flags |= PF_MEMALLOC;
 	cFYI(1, ("Demultiplex PID: %d", task_pid_nr(current)));
 
 	length = atomic_inc_return(&tcpSesAllocCount);
-- 
1.6.2.5




^ permalink raw reply related	[flat|nested] 108+ messages in thread

* [PATCH 6/7] cifs: Don't use PF_MEMALLOC
@ 2009-11-17  7:22   ` KOSAKI Motohiro
  0 siblings, 0 replies; 108+ messages in thread
From: KOSAKI Motohiro @ 2009-11-17  7:22 UTC (permalink / raw)
  To: LKML
  Cc: kosaki.motohiro, linux-mm, Andrew Morton, Steve French,
	linux-cifs-client, samba-technical


Non MM subsystem must not use PF_MEMALLOC. Memory reclaim need few
memory, anyone must not prevent it. Otherwise the system cause
mysterious hang-up and/or OOM Killer invokation.

Cc: Steve French <sfrench@samba.org>
Cc: linux-cifs-client@lists.samba.org
Cc: samba-technical@lists.samba.org
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 fs/cifs/connect.c |    1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
index 63ea83f..f9b1553 100644
--- a/fs/cifs/connect.c
+++ b/fs/cifs/connect.c
@@ -337,7 +337,6 @@ cifs_demultiplex_thread(struct TCP_Server_Info *server)
 	bool isMultiRsp;
 	int reconnect;
 
-	current->flags |= PF_MEMALLOC;
 	cFYI(1, ("Demultiplex PID: %d", task_pid_nr(current)));
 
 	length = atomic_inc_return(&tcpSesAllocCount);
-- 
1.6.2.5



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 108+ messages in thread

* [PATCH 7/7] xfs: Don't use PF_MEMALLOC
  2009-11-17  7:16 ` KOSAKI Motohiro
  (?)
@ 2009-11-17  7:23   ` KOSAKI Motohiro
  -1 siblings, 0 replies; 108+ messages in thread
From: KOSAKI Motohiro @ 2009-11-17  7:23 UTC (permalink / raw)
  To: LKML
  Cc: kosaki.motohiro, linux-mm, Andrew Morton, Christoph Hellwig,
	linux-fsdevel, xfs-masters, xfs


Non MM subsystem must not use PF_MEMALLOC. Memory reclaim need few
memory, anyone must not prevent it. Otherwise the system cause
mysterious hang-up and/or OOM Killer invokation.

Cc: Christoph Hellwig <hch@infradead.org>
Cc: linux-fsdevel@vger.kernel.org
Cc: xfs-masters@oss.sgi.com
Cc: xfs@oss.sgi.com
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 fs/xfs/linux-2.6/xfs_buf.c |    2 --
 1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/fs/xfs/linux-2.6/xfs_buf.c b/fs/xfs/linux-2.6/xfs_buf.c
index 965df12..b9a06fc 100644
--- a/fs/xfs/linux-2.6/xfs_buf.c
+++ b/fs/xfs/linux-2.6/xfs_buf.c
@@ -1724,8 +1724,6 @@ xfsbufd(
 	int		count;
 	xfs_buf_t	*bp;
 
-	current->flags |= PF_MEMALLOC;
-
 	set_freezable();
 
 	do {
-- 
1.6.2.5




^ permalink raw reply related	[flat|nested] 108+ messages in thread

* [PATCH 7/7] xfs: Don't use PF_MEMALLOC
@ 2009-11-17  7:23   ` KOSAKI Motohiro
  0 siblings, 0 replies; 108+ messages in thread
From: KOSAKI Motohiro @ 2009-11-17  7:23 UTC (permalink / raw)
  To: LKML
  Cc: xfs-masters, xfs, Christoph Hellwig, linux-mm, kosaki.motohiro,
	linux-fsdevel, Andrew Morton


Non MM subsystem must not use PF_MEMALLOC. Memory reclaim need few
memory, anyone must not prevent it. Otherwise the system cause
mysterious hang-up and/or OOM Killer invokation.

Cc: Christoph Hellwig <hch@infradead.org>
Cc: linux-fsdevel@vger.kernel.org
Cc: xfs-masters@oss.sgi.com
Cc: xfs@oss.sgi.com
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 fs/xfs/linux-2.6/xfs_buf.c |    2 --
 1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/fs/xfs/linux-2.6/xfs_buf.c b/fs/xfs/linux-2.6/xfs_buf.c
index 965df12..b9a06fc 100644
--- a/fs/xfs/linux-2.6/xfs_buf.c
+++ b/fs/xfs/linux-2.6/xfs_buf.c
@@ -1724,8 +1724,6 @@ xfsbufd(
 	int		count;
 	xfs_buf_t	*bp;
 
-	current->flags |= PF_MEMALLOC;
-
 	set_freezable();
 
 	do {
-- 
1.6.2.5



_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 108+ messages in thread

* [PATCH 7/7] xfs: Don't use PF_MEMALLOC
@ 2009-11-17  7:23   ` KOSAKI Motohiro
  0 siblings, 0 replies; 108+ messages in thread
From: KOSAKI Motohiro @ 2009-11-17  7:23 UTC (permalink / raw)
  To: LKML
  Cc: kosaki.motohiro, linux-mm, Andrew Morton, Christoph Hellwig,
	linux-fsdevel, xfs-masters, xfs


Non MM subsystem must not use PF_MEMALLOC. Memory reclaim need few
memory, anyone must not prevent it. Otherwise the system cause
mysterious hang-up and/or OOM Killer invokation.

Cc: Christoph Hellwig <hch@infradead.org>
Cc: linux-fsdevel@vger.kernel.org
Cc: xfs-masters@oss.sgi.com
Cc: xfs@oss.sgi.com
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 fs/xfs/linux-2.6/xfs_buf.c |    2 --
 1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/fs/xfs/linux-2.6/xfs_buf.c b/fs/xfs/linux-2.6/xfs_buf.c
index 965df12..b9a06fc 100644
--- a/fs/xfs/linux-2.6/xfs_buf.c
+++ b/fs/xfs/linux-2.6/xfs_buf.c
@@ -1724,8 +1724,6 @@ xfsbufd(
 	int		count;
 	xfs_buf_t	*bp;
 
-	current->flags |= PF_MEMALLOC;
-
 	set_freezable();
 
 	do {
-- 
1.6.2.5



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 108+ messages in thread

* [PATCH] Mark cifs mailing list as "moderated as non-subscribers"
  2009-11-17  7:22   ` KOSAKI Motohiro
@ 2009-11-17  7:32     ` KOSAKI Motohiro
  -1 siblings, 0 replies; 108+ messages in thread
From: KOSAKI Motohiro @ 2009-11-17  7:32 UTC (permalink / raw)
  To: LKML; +Cc: kosaki.motohiro, linux-mm, Andrew Morton, Steve French


if non-subscribers post bug report to CIFS mailing list, they will get
following messages.

	Your mail to 'linux-cifs-client' with the subject

	    [PATCH x/x] cifs: xxxxxxxxxxxxx

	Is being held until the list moderator can review it for approval.

	The reason it is being held:

	    Post by non-member to a members-only list

	Either the message will get posted to the list, or you will receive
	notification of the moderator's decision.  If you would like to cancel
	this posting, please visit the following URL:

members-only list should be written as so in MAINTAINERS file.


Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 MAINTAINERS |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 81d68d5..5c44047 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1418,8 +1418,8 @@ F:	include/linux/coda*.h
 
 COMMON INTERNET FILE SYSTEM (CIFS)
 M:	Steve French <sfrench@samba.org>
-L:	linux-cifs-client@lists.samba.org
-L:	samba-technical@lists.samba.org
+L:	linux-cifs-client@lists.samba.org (moderated for non-subscribers)
+L:	samba-technical@lists.samba.org (moderated for non-subscribers)
 W:	http://linux-cifs.samba.org/
 T:	git git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6.git
 S:	Supported
-- 
1.6.2.5




^ permalink raw reply related	[flat|nested] 108+ messages in thread

* [PATCH] Mark cifs mailing list as "moderated as non-subscribers"
@ 2009-11-17  7:32     ` KOSAKI Motohiro
  0 siblings, 0 replies; 108+ messages in thread
From: KOSAKI Motohiro @ 2009-11-17  7:32 UTC (permalink / raw)
  To: LKML; +Cc: kosaki.motohiro, linux-mm, Andrew Morton, Steve French


if non-subscribers post bug report to CIFS mailing list, they will get
following messages.

	Your mail to 'linux-cifs-client' with the subject

	    [PATCH x/x] cifs: xxxxxxxxxxxxx

	Is being held until the list moderator can review it for approval.

	The reason it is being held:

	    Post by non-member to a members-only list

	Either the message will get posted to the list, or you will receive
	notification of the moderator's decision.  If you would like to cancel
	this posting, please visit the following URL:

members-only list should be written as so in MAINTAINERS file.


Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 MAINTAINERS |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 81d68d5..5c44047 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1418,8 +1418,8 @@ F:	include/linux/coda*.h
 
 COMMON INTERNET FILE SYSTEM (CIFS)
 M:	Steve French <sfrench@samba.org>
-L:	linux-cifs-client@lists.samba.org
-L:	samba-technical@lists.samba.org
+L:	linux-cifs-client@lists.samba.org (moderated for non-subscribers)
+L:	samba-technical@lists.samba.org (moderated for non-subscribers)
 W:	http://linux-cifs.samba.org/
 T:	git git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6.git
 S:	Supported
-- 
1.6.2.5



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 108+ messages in thread

* Re: [PATCH 0/7] Kill PF_MEMALLOC abuse
  2009-11-17  7:16 ` KOSAKI Motohiro
@ 2009-11-17  8:07   ` David Rientjes
  -1 siblings, 0 replies; 108+ messages in thread
From: David Rientjes @ 2009-11-17  8:07 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: linux-mm, LKML, Andrew Morton

On Tue, 17 Nov 2009, KOSAKI Motohiro wrote:

> 
> PF_MEMALLOC have following effects.
>   (1) Ignore zone watermark
>   (2) Don't call reclaim although allocation failure, instead return ENOMEM
>   (3) Don't invoke OOM Killer
>   (4) Don't retry internally in page alloc
> 
> Some subsystem paid attention (1) only, and start to use PF_MEMALLOC abuse.
> But, the fact is, PF_MEMALLOC is the promise of "I have lots freeable memory.
> if I allocate few memory, I can return more much meory to the system!".
> Non MM subsystem must not use PF_MEMALLOC. Memory reclaim
> need few memory, anyone must not prevent it. Otherwise the system cause
> mysterious hang-up and/or OOM Killer invokation.
> 
> if many subsystem will be able to use emergency memory without any
> usage rule, it isn't for emergency. it can become empty easily.
> 
> Plus, characteristics (2)-(4) mean PF_MEMALLOC don't fit to general
> high priority memory allocation.
> 
> Thus, We kill all PF_MEMALLOC usage in no MM subsystem.
> 

I agree in principle with removing non-VM users of PF_MEMALLOC, but I 
think it should be left to the individual subsystem maintainers to apply 
or ack since the allocations may depend on the __GFP_NORETRY | ~__GFP_WAIT 
behavior of PF_MEMALLOC.  This could be potentially dangerous for a 
PF_MEMALLOC user if allocations made by the kthread, for example, should 
never retry for orders smaller than PAGE_ALLOC_COSTLY_ORDER or block on 
direct reclaim.

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 0/7] Kill PF_MEMALLOC abuse
@ 2009-11-17  8:07   ` David Rientjes
  0 siblings, 0 replies; 108+ messages in thread
From: David Rientjes @ 2009-11-17  8:07 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: linux-mm, LKML, Andrew Morton

On Tue, 17 Nov 2009, KOSAKI Motohiro wrote:

> 
> PF_MEMALLOC have following effects.
>   (1) Ignore zone watermark
>   (2) Don't call reclaim although allocation failure, instead return ENOMEM
>   (3) Don't invoke OOM Killer
>   (4) Don't retry internally in page alloc
> 
> Some subsystem paid attention (1) only, and start to use PF_MEMALLOC abuse.
> But, the fact is, PF_MEMALLOC is the promise of "I have lots freeable memory.
> if I allocate few memory, I can return more much meory to the system!".
> Non MM subsystem must not use PF_MEMALLOC. Memory reclaim
> need few memory, anyone must not prevent it. Otherwise the system cause
> mysterious hang-up and/or OOM Killer invokation.
> 
> if many subsystem will be able to use emergency memory without any
> usage rule, it isn't for emergency. it can become empty easily.
> 
> Plus, characteristics (2)-(4) mean PF_MEMALLOC don't fit to general
> high priority memory allocation.
> 
> Thus, We kill all PF_MEMALLOC usage in no MM subsystem.
> 

I agree in principle with removing non-VM users of PF_MEMALLOC, but I 
think it should be left to the individual subsystem maintainers to apply 
or ack since the allocations may depend on the __GFP_NORETRY | ~__GFP_WAIT 
behavior of PF_MEMALLOC.  This could be potentially dangerous for a 
PF_MEMALLOC user if allocations made by the kthread, for example, should 
never retry for orders smaller than PAGE_ALLOC_COSTLY_ORDER or block on 
direct reclaim.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 0/7] Kill PF_MEMALLOC abuse
  2009-11-17  8:07   ` David Rientjes
@ 2009-11-17  8:33     ` KOSAKI Motohiro
  -1 siblings, 0 replies; 108+ messages in thread
From: KOSAKI Motohiro @ 2009-11-17  8:33 UTC (permalink / raw)
  To: David Rientjes; +Cc: kosaki.motohiro, linux-mm, LKML, Andrew Morton

> On Tue, 17 Nov 2009, KOSAKI Motohiro wrote:
> 
> > 
> > PF_MEMALLOC have following effects.
> >   (1) Ignore zone watermark
> >   (2) Don't call reclaim although allocation failure, instead return ENOMEM
> >   (3) Don't invoke OOM Killer
> >   (4) Don't retry internally in page alloc
> > 
> > Some subsystem paid attention (1) only, and start to use PF_MEMALLOC abuse.
> > But, the fact is, PF_MEMALLOC is the promise of "I have lots freeable memory.
> > if I allocate few memory, I can return more much meory to the system!".
> > Non MM subsystem must not use PF_MEMALLOC. Memory reclaim
> > need few memory, anyone must not prevent it. Otherwise the system cause
> > mysterious hang-up and/or OOM Killer invokation.
> > 
> > if many subsystem will be able to use emergency memory without any
> > usage rule, it isn't for emergency. it can become empty easily.
> > 
> > Plus, characteristics (2)-(4) mean PF_MEMALLOC don't fit to general
> > high priority memory allocation.
> > 
> > Thus, We kill all PF_MEMALLOC usage in no MM subsystem.
> 
> I agree in principle with removing non-VM users of PF_MEMALLOC, but I 
> think it should be left to the individual subsystem maintainers to apply 
> or ack since the allocations may depend on the __GFP_NORETRY | ~__GFP_WAIT 
> behavior of PF_MEMALLOC.  This could be potentially dangerous for a 
> PF_MEMALLOC user if allocations made by the kthread, for example, should 
> never retry for orders smaller than PAGE_ALLOC_COSTLY_ORDER or block on 
> direct reclaim.

if there is so such reason. we might need to implement another MM trick.
but keeping this strage usage is not a option. All memory freeing activity
(e.g. page out, task killing) need some memory. we need to protect its
emergency memory. otherwise linux reliability decrease dramatically when
the system face to memory stress.






^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 0/7] Kill PF_MEMALLOC abuse
@ 2009-11-17  8:33     ` KOSAKI Motohiro
  0 siblings, 0 replies; 108+ messages in thread
From: KOSAKI Motohiro @ 2009-11-17  8:33 UTC (permalink / raw)
  To: David Rientjes; +Cc: kosaki.motohiro, linux-mm, LKML, Andrew Morton

> On Tue, 17 Nov 2009, KOSAKI Motohiro wrote:
> 
> > 
> > PF_MEMALLOC have following effects.
> >   (1) Ignore zone watermark
> >   (2) Don't call reclaim although allocation failure, instead return ENOMEM
> >   (3) Don't invoke OOM Killer
> >   (4) Don't retry internally in page alloc
> > 
> > Some subsystem paid attention (1) only, and start to use PF_MEMALLOC abuse.
> > But, the fact is, PF_MEMALLOC is the promise of "I have lots freeable memory.
> > if I allocate few memory, I can return more much meory to the system!".
> > Non MM subsystem must not use PF_MEMALLOC. Memory reclaim
> > need few memory, anyone must not prevent it. Otherwise the system cause
> > mysterious hang-up and/or OOM Killer invokation.
> > 
> > if many subsystem will be able to use emergency memory without any
> > usage rule, it isn't for emergency. it can become empty easily.
> > 
> > Plus, characteristics (2)-(4) mean PF_MEMALLOC don't fit to general
> > high priority memory allocation.
> > 
> > Thus, We kill all PF_MEMALLOC usage in no MM subsystem.
> 
> I agree in principle with removing non-VM users of PF_MEMALLOC, but I 
> think it should be left to the individual subsystem maintainers to apply 
> or ack since the allocations may depend on the __GFP_NORETRY | ~__GFP_WAIT 
> behavior of PF_MEMALLOC.  This could be potentially dangerous for a 
> PF_MEMALLOC user if allocations made by the kthread, for example, should 
> never retry for orders smaller than PAGE_ALLOC_COSTLY_ORDER or block on 
> direct reclaim.

if there is so such reason. we might need to implement another MM trick.
but keeping this strage usage is not a option. All memory freeing activity
(e.g. page out, task killing) need some memory. we need to protect its
emergency memory. otherwise linux reliability decrease dramatically when
the system face to memory stress.





--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 0/7] Kill PF_MEMALLOC abuse
  2009-11-17  8:33     ` KOSAKI Motohiro
@ 2009-11-17  8:36       ` David Rientjes
  -1 siblings, 0 replies; 108+ messages in thread
From: David Rientjes @ 2009-11-17  8:36 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: linux-mm, LKML, Andrew Morton

On Tue, 17 Nov 2009, KOSAKI Motohiro wrote:

> > I agree in principle with removing non-VM users of PF_MEMALLOC, but I 
> > think it should be left to the individual subsystem maintainers to apply 
> > or ack since the allocations may depend on the __GFP_NORETRY | ~__GFP_WAIT 
> > behavior of PF_MEMALLOC.  This could be potentially dangerous for a 
> > PF_MEMALLOC user if allocations made by the kthread, for example, should 
> > never retry for orders smaller than PAGE_ALLOC_COSTLY_ORDER or block on 
> > direct reclaim.
> 
> if there is so such reason. we might need to implement another MM trick.
> but keeping this strage usage is not a option. All memory freeing activity
> (e.g. page out, task killing) need some memory. we need to protect its
> emergency memory. otherwise linux reliability decrease dramatically when
> the system face to memory stress.
> 

Right, that's why I agree with trying to remove non-VM use of PF_MEMALLOC, 
but I think this patchset needs to go through the individual subsystem 
maintainers so they can ensure the conversion doesn't cause undesirable 
results if their kthreads' memory allocations depend on the __GFP_NORETRY 
behavior that PF_MEMALLOC ensures.  Otherwise it looks good.

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 0/7] Kill PF_MEMALLOC abuse
@ 2009-11-17  8:36       ` David Rientjes
  0 siblings, 0 replies; 108+ messages in thread
From: David Rientjes @ 2009-11-17  8:36 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: linux-mm, LKML, Andrew Morton

On Tue, 17 Nov 2009, KOSAKI Motohiro wrote:

> > I agree in principle with removing non-VM users of PF_MEMALLOC, but I 
> > think it should be left to the individual subsystem maintainers to apply 
> > or ack since the allocations may depend on the __GFP_NORETRY | ~__GFP_WAIT 
> > behavior of PF_MEMALLOC.  This could be potentially dangerous for a 
> > PF_MEMALLOC user if allocations made by the kthread, for example, should 
> > never retry for orders smaller than PAGE_ALLOC_COSTLY_ORDER or block on 
> > direct reclaim.
> 
> if there is so such reason. we might need to implement another MM trick.
> but keeping this strage usage is not a option. All memory freeing activity
> (e.g. page out, task killing) need some memory. we need to protect its
> emergency memory. otherwise linux reliability decrease dramatically when
> the system face to memory stress.
> 

Right, that's why I agree with trying to remove non-VM use of PF_MEMALLOC, 
but I think this patchset needs to go through the individual subsystem 
maintainers so they can ensure the conversion doesn't cause undesirable 
results if their kthreads' memory allocations depend on the __GFP_NORETRY 
behavior that PF_MEMALLOC ensures.  Otherwise it looks good.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 0/7] Kill PF_MEMALLOC abuse
  2009-11-17  7:16 ` KOSAKI Motohiro
@ 2009-11-17 10:15   ` Christoph Hellwig
  -1 siblings, 0 replies; 108+ messages in thread
From: Christoph Hellwig @ 2009-11-17 10:15 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: linux-mm, LKML, Andrew Morton

On Tue, Nov 17, 2009 at 04:16:04PM +0900, KOSAKI Motohiro wrote:
> Some subsystem paid attention (1) only, and start to use PF_MEMALLOC abuse.
> But, the fact is, PF_MEMALLOC is the promise of "I have lots freeable memory.
> if I allocate few memory, I can return more much meory to the system!".
> Non MM subsystem must not use PF_MEMALLOC. Memory reclaim
> need few memory, anyone must not prevent it. Otherwise the system cause
> mysterious hang-up and/or OOM Killer invokation.

And that's exactly the promises xfsbufd gives.  It writes out dirty
metadata buffers and will free lots of memory if you kick it. 


^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 0/7] Kill PF_MEMALLOC abuse
@ 2009-11-17 10:15   ` Christoph Hellwig
  0 siblings, 0 replies; 108+ messages in thread
From: Christoph Hellwig @ 2009-11-17 10:15 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: linux-mm, LKML, Andrew Morton

On Tue, Nov 17, 2009 at 04:16:04PM +0900, KOSAKI Motohiro wrote:
> Some subsystem paid attention (1) only, and start to use PF_MEMALLOC abuse.
> But, the fact is, PF_MEMALLOC is the promise of "I have lots freeable memory.
> if I allocate few memory, I can return more much meory to the system!".
> Non MM subsystem must not use PF_MEMALLOC. Memory reclaim
> need few memory, anyone must not prevent it. Otherwise the system cause
> mysterious hang-up and/or OOM Killer invokation.

And that's exactly the promises xfsbufd gives.  It writes out dirty
metadata buffers and will free lots of memory if you kick it. 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 0/7] Kill PF_MEMALLOC abuse
  2009-11-17 10:15   ` Christoph Hellwig
@ 2009-11-17 10:24     ` KOSAKI Motohiro
  -1 siblings, 0 replies; 108+ messages in thread
From: KOSAKI Motohiro @ 2009-11-17 10:24 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: kosaki.motohiro, linux-mm, LKML, Andrew Morton

> On Tue, Nov 17, 2009 at 04:16:04PM +0900, KOSAKI Motohiro wrote:
> > Some subsystem paid attention (1) only, and start to use PF_MEMALLOC abuse.
> > But, the fact is, PF_MEMALLOC is the promise of "I have lots freeable memory.
> > if I allocate few memory, I can return more much meory to the system!".
> > Non MM subsystem must not use PF_MEMALLOC. Memory reclaim
> > need few memory, anyone must not prevent it. Otherwise the system cause
> > mysterious hang-up and/or OOM Killer invokation.
> 
> And that's exactly the promises xfsbufd gives.  It writes out dirty
> metadata buffers and will free lots of memory if you kick it. 

if xfsbufd doesn't only write out dirty data but also drop page,
I agree you. 



^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 0/7] Kill PF_MEMALLOC abuse
@ 2009-11-17 10:24     ` KOSAKI Motohiro
  0 siblings, 0 replies; 108+ messages in thread
From: KOSAKI Motohiro @ 2009-11-17 10:24 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: kosaki.motohiro, linux-mm, LKML, Andrew Morton

> On Tue, Nov 17, 2009 at 04:16:04PM +0900, KOSAKI Motohiro wrote:
> > Some subsystem paid attention (1) only, and start to use PF_MEMALLOC abuse.
> > But, the fact is, PF_MEMALLOC is the promise of "I have lots freeable memory.
> > if I allocate few memory, I can return more much meory to the system!".
> > Non MM subsystem must not use PF_MEMALLOC. Memory reclaim
> > need few memory, anyone must not prevent it. Otherwise the system cause
> > mysterious hang-up and/or OOM Killer invokation.
> 
> And that's exactly the promises xfsbufd gives.  It writes out dirty
> metadata buffers and will free lots of memory if you kick it. 

if xfsbufd doesn't only write out dirty data but also drop page,
I agree you. 


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 0/7] Kill PF_MEMALLOC abuse
  2009-11-17 10:24     ` KOSAKI Motohiro
@ 2009-11-17 10:27       ` Christoph Hellwig
  -1 siblings, 0 replies; 108+ messages in thread
From: Christoph Hellwig @ 2009-11-17 10:27 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: Christoph Hellwig, linux-mm, LKML, Andrew Morton

On Tue, Nov 17, 2009 at 07:24:42PM +0900, KOSAKI Motohiro wrote:
> if xfsbufd doesn't only write out dirty data but also drop page,
> I agree you. 

It then drops the reference to the buffer which drops references to the
pages, which often are the last references, yes.


^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 0/7] Kill PF_MEMALLOC abuse
@ 2009-11-17 10:27       ` Christoph Hellwig
  0 siblings, 0 replies; 108+ messages in thread
From: Christoph Hellwig @ 2009-11-17 10:27 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: Christoph Hellwig, linux-mm, LKML, Andrew Morton

On Tue, Nov 17, 2009 at 07:24:42PM +0900, KOSAKI Motohiro wrote:
> if xfsbufd doesn't only write out dirty data but also drop page,
> I agree you. 

It then drops the reference to the buffer which drops references to the
pages, which often are the last references, yes.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 2/7] mmc: Don't use PF_MEMALLOC
  2009-11-17  7:17   ` KOSAKI Motohiro
  (?)
@ 2009-11-17 10:29     ` Alan Cox
  -1 siblings, 0 replies; 108+ messages in thread
From: Alan Cox @ 2009-11-17 10:29 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: LKML, kosaki.motohiro, linux-mm, Andrew Morton, linux-mmc

On Tue, 17 Nov 2009 16:17:50 +0900 (JST)
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:

> Non MM subsystem must not use PF_MEMALLOC. Memory reclaim need few
> memory, anyone must not prevent it. Otherwise the system cause
> mysterious hang-up and/or OOM Killer invokation.

So now what happens if we are paging and all our memory is tied up for
writeback to a device or CIFS etc which can no longer allocate the memory
to complete the write out so the MM can reclaim ?

Am I missing something or is this patch set not addressing the case where
the writeback thread needs to inherit PF_MEMALLOC somehow (at least for
the I/O in question and those blocking it)

Alan

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 2/7] mmc: Don't use PF_MEMALLOC
@ 2009-11-17 10:29     ` Alan Cox
  0 siblings, 0 replies; 108+ messages in thread
From: Alan Cox @ 2009-11-17 10:29 UTC (permalink / raw)
  Cc: LKML, kosaki.motohiro, linux-mm, Andrew Morton, linux-mmc

On Tue, 17 Nov 2009 16:17:50 +0900 (JST)
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:

> Non MM subsystem must not use PF_MEMALLOC. Memory reclaim need few
> memory, anyone must not prevent it. Otherwise the system cause
> mysterious hang-up and/or OOM Killer invokation.

So now what happens if we are paging and all our memory is tied up for
writeback to a device or CIFS etc which can no longer allocate the memory
to complete the write out so the MM can reclaim ?

Am I missing something or is this patch set not addressing the case where
the writeback thread needs to inherit PF_MEMALLOC somehow (at least for
the I/O in question and those blocking it)

Alan

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 2/7] mmc: Don't use PF_MEMALLOC
@ 2009-11-17 10:29     ` Alan Cox
  0 siblings, 0 replies; 108+ messages in thread
From: Alan Cox @ 2009-11-17 10:29 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: LKML, linux-mm, Andrew Morton, linux-mmc

On Tue, 17 Nov 2009 16:17:50 +0900 (JST)
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:

> Non MM subsystem must not use PF_MEMALLOC. Memory reclaim need few
> memory, anyone must not prevent it. Otherwise the system cause
> mysterious hang-up and/or OOM Killer invokation.

So now what happens if we are paging and all our memory is tied up for
writeback to a device or CIFS etc which can no longer allocate the memory
to complete the write out so the MM can reclaim ?

Am I missing something or is this patch set not addressing the case where
the writeback thread needs to inherit PF_MEMALLOC somehow (at least for
the I/O in question and those blocking it)

Alan

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 2/7] mmc: Don't use PF_MEMALLOC
  2009-11-17 10:29     ` Alan Cox
@ 2009-11-17 10:32       ` Minchan Kim
  -1 siblings, 0 replies; 108+ messages in thread
From: Minchan Kim @ 2009-11-17 10:32 UTC (permalink / raw)
  To: Alan Cox; +Cc: KOSAKI Motohiro, LKML, linux-mm, Andrew Morton, linux-mmc

On Tue, Nov 17, 2009 at 7:29 PM, Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:
> On Tue, 17 Nov 2009 16:17:50 +0900 (JST)
> KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:
>
>> Non MM subsystem must not use PF_MEMALLOC. Memory reclaim need few
>> memory, anyone must not prevent it. Otherwise the system cause
>> mysterious hang-up and/or OOM Killer invokation.
>
> So now what happens if we are paging and all our memory is tied up for
> writeback to a device or CIFS etc which can no longer allocate the memory
> to complete the write out so the MM can reclaim ?
>
> Am I missing something or is this patch set not addressing the case where
> the writeback thread needs to inherit PF_MEMALLOC somehow (at least for
> the I/O in question and those blocking it)
>

I agree.
At least, drivers for writeout is proper for using PF_MEMALLOC, I think.


> Alan
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>



-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 2/7] mmc: Don't use PF_MEMALLOC
@ 2009-11-17 10:32       ` Minchan Kim
  0 siblings, 0 replies; 108+ messages in thread
From: Minchan Kim @ 2009-11-17 10:32 UTC (permalink / raw)
  To: Alan Cox; +Cc: KOSAKI Motohiro, LKML, linux-mm, Andrew Morton, linux-mmc

On Tue, Nov 17, 2009 at 7:29 PM, Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:
> On Tue, 17 Nov 2009 16:17:50 +0900 (JST)
> KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:
>
>> Non MM subsystem must not use PF_MEMALLOC. Memory reclaim need few
>> memory, anyone must not prevent it. Otherwise the system cause
>> mysterious hang-up and/or OOM Killer invokation.
>
> So now what happens if we are paging and all our memory is tied up for
> writeback to a device or CIFS etc which can no longer allocate the memory
> to complete the write out so the MM can reclaim ?
>
> Am I missing something or is this patch set not addressing the case where
> the writeback thread needs to inherit PF_MEMALLOC somehow (at least for
> the I/O in question and those blocking it)
>

I agree.
At least, drivers for writeout is proper for using PF_MEMALLOC, I think.


> Alan
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>



-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 2/7] mmc: Don't use PF_MEMALLOC
  2009-11-17 10:32       ` Minchan Kim
@ 2009-11-17 10:38         ` Oliver Neukum
  -1 siblings, 0 replies; 108+ messages in thread
From: Oliver Neukum @ 2009-11-17 10:38 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Alan Cox, KOSAKI Motohiro, LKML, linux-mm, Andrew Morton, linux-mmc

Am Dienstag, 17. November 2009 11:32:36 schrieb Minchan Kim:
> On Tue, Nov 17, 2009 at 7:29 PM, Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:
> > On Tue, 17 Nov 2009 16:17:50 +0900 (JST)
> >
> > KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:
> >> Non MM subsystem must not use PF_MEMALLOC. Memory reclaim need few
> >> memory, anyone must not prevent it. Otherwise the system cause
> >> mysterious hang-up and/or OOM Killer invokation.
> >
> > So now what happens if we are paging and all our memory is tied up for
> > writeback to a device or CIFS etc which can no longer allocate the memory
> > to complete the write out so the MM can reclaim ?
> >
> > Am I missing something or is this patch set not addressing the case where
> > the writeback thread needs to inherit PF_MEMALLOC somehow (at least for
> > the I/O in question and those blocking it)
> 
> I agree.
> At least, drivers for writeout is proper for using PF_MEMALLOC, I think.

For the same reason error handling should also use it, shouldn't it?

	Regards
		Oliver

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 2/7] mmc: Don't use PF_MEMALLOC
@ 2009-11-17 10:38         ` Oliver Neukum
  0 siblings, 0 replies; 108+ messages in thread
From: Oliver Neukum @ 2009-11-17 10:38 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Alan Cox, KOSAKI Motohiro, LKML, linux-mm, Andrew Morton, linux-mmc

Am Dienstag, 17. November 2009 11:32:36 schrieb Minchan Kim:
> On Tue, Nov 17, 2009 at 7:29 PM, Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:
> > On Tue, 17 Nov 2009 16:17:50 +0900 (JST)
> >
> > KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:
> >> Non MM subsystem must not use PF_MEMALLOC. Memory reclaim need few
> >> memory, anyone must not prevent it. Otherwise the system cause
> >> mysterious hang-up and/or OOM Killer invokation.
> >
> > So now what happens if we are paging and all our memory is tied up for
> > writeback to a device or CIFS etc which can no longer allocate the memory
> > to complete the write out so the MM can reclaim ?
> >
> > Am I missing something or is this patch set not addressing the case where
> > the writeback thread needs to inherit PF_MEMALLOC somehow (at least for
> > the I/O in question and those blocking it)
> 
> I agree.
> At least, drivers for writeout is proper for using PF_MEMALLOC, I think.

For the same reason error handling should also use it, shouldn't it?

	Regards
		Oliver

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 2/7] mmc: Don't use PF_MEMALLOC
  2009-11-17 10:29     ` Alan Cox
@ 2009-11-17 11:58       ` KOSAKI Motohiro
  -1 siblings, 0 replies; 108+ messages in thread
From: KOSAKI Motohiro @ 2009-11-17 11:58 UTC (permalink / raw)
  To: Alan Cox; +Cc: kosaki.motohiro, LKML, linux-mm, Andrew Morton, linux-mmc

> On Tue, 17 Nov 2009 16:17:50 +0900 (JST)
> KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:
> 
> > Non MM subsystem must not use PF_MEMALLOC. Memory reclaim need few
> > memory, anyone must not prevent it. Otherwise the system cause
> > mysterious hang-up and/or OOM Killer invokation.
> 
> So now what happens if we are paging and all our memory is tied up for
> writeback to a device or CIFS etc which can no longer allocate the memory
> to complete the write out so the MM can reclaim ?

Probably my answer is not so simple. sorry.

reason1: MM reclaim does both dropping clean memory and writing out dirty pages.
reason2: if all memory is exhausted, maybe we can't recover it. it is
fundamental limitation of Virtual Memory subsystem. and, min-watermark is
decided by number of system physcal memory, but # of I/O issue (i.e. # of
pages of used by writeback thread) is mainly decided # of devices. 
then, we can't gurantee min-watermark is sufficient on any systems.
Only reasonable solution is mempool like reservation, I think.
IOW, any reservation memory shouldn't share unrelated subsystem. otherwise
we lost any gurantee.

So, I think we need to hear why many developer don't use mempool,
instead use PF_MEMALLOC.

> Am I missing something or is this patch set not addressing the case where
> the writeback thread needs to inherit PF_MEMALLOC somehow (at least for
> the I/O in question and those blocking it)

Yes, probably my patchset isn't perfect. honestly I haven't understand
why so many developer prefer to use PF_MEMALLOC.




^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 2/7] mmc: Don't use PF_MEMALLOC
@ 2009-11-17 11:58       ` KOSAKI Motohiro
  0 siblings, 0 replies; 108+ messages in thread
From: KOSAKI Motohiro @ 2009-11-17 11:58 UTC (permalink / raw)
  To: Alan Cox; +Cc: kosaki.motohiro, LKML, linux-mm, Andrew Morton, linux-mmc

> On Tue, 17 Nov 2009 16:17:50 +0900 (JST)
> KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:
> 
> > Non MM subsystem must not use PF_MEMALLOC. Memory reclaim need few
> > memory, anyone must not prevent it. Otherwise the system cause
> > mysterious hang-up and/or OOM Killer invokation.
> 
> So now what happens if we are paging and all our memory is tied up for
> writeback to a device or CIFS etc which can no longer allocate the memory
> to complete the write out so the MM can reclaim ?

Probably my answer is not so simple. sorry.

reason1: MM reclaim does both dropping clean memory and writing out dirty pages.
reason2: if all memory is exhausted, maybe we can't recover it. it is
fundamental limitation of Virtual Memory subsystem. and, min-watermark is
decided by number of system physcal memory, but # of I/O issue (i.e. # of
pages of used by writeback thread) is mainly decided # of devices. 
then, we can't gurantee min-watermark is sufficient on any systems.
Only reasonable solution is mempool like reservation, I think.
IOW, any reservation memory shouldn't share unrelated subsystem. otherwise
we lost any gurantee.

So, I think we need to hear why many developer don't use mempool,
instead use PF_MEMALLOC.

> Am I missing something or is this patch set not addressing the case where
> the writeback thread needs to inherit PF_MEMALLOC somehow (at least for
> the I/O in question and those blocking it)

Yes, probably my patchset isn't perfect. honestly I haven't understand
why so many developer prefer to use PF_MEMALLOC.



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 0/7] Kill PF_MEMALLOC abuse
  2009-11-17 10:27       ` Christoph Hellwig
@ 2009-11-17 12:24         ` KOSAKI Motohiro
  -1 siblings, 0 replies; 108+ messages in thread
From: KOSAKI Motohiro @ 2009-11-17 12:24 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: kosaki.motohiro, linux-mm, LKML, Andrew Morton

> On Tue, Nov 17, 2009 at 07:24:42PM +0900, KOSAKI Motohiro wrote:
> > if xfsbufd doesn't only write out dirty data but also drop page,
> > I agree you. 
> 
> It then drops the reference to the buffer which drops references to the
> pages, which often are the last references, yes.

I though it is not typical case. Am I wrong?
if so, I'm sorry. I'm not XFS expert.




^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 0/7] Kill PF_MEMALLOC abuse
@ 2009-11-17 12:24         ` KOSAKI Motohiro
  0 siblings, 0 replies; 108+ messages in thread
From: KOSAKI Motohiro @ 2009-11-17 12:24 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: kosaki.motohiro, linux-mm, LKML, Andrew Morton

> On Tue, Nov 17, 2009 at 07:24:42PM +0900, KOSAKI Motohiro wrote:
> > if xfsbufd doesn't only write out dirty data but also drop page,
> > I agree you. 
> 
> It then drops the reference to the buffer which drops references to the
> pages, which often are the last references, yes.

I though it is not typical case. Am I wrong?
if so, I'm sorry. I'm not XFS expert.



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 0/7] Kill PF_MEMALLOC abuse
  2009-11-17 12:24         ` KOSAKI Motohiro
@ 2009-11-17 12:47           ` Christoph Hellwig
  -1 siblings, 0 replies; 108+ messages in thread
From: Christoph Hellwig @ 2009-11-17 12:47 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: Christoph Hellwig, linux-mm, LKML, Andrew Morton

On Tue, Nov 17, 2009 at 09:24:24PM +0900, KOSAKI Motohiro wrote:
> > On Tue, Nov 17, 2009 at 07:24:42PM +0900, KOSAKI Motohiro wrote:
> > > if xfsbufd doesn't only write out dirty data but also drop page,
> > > I agree you. 
> > 
> > It then drops the reference to the buffer which drops references to the
> > pages, which often are the last references, yes.
> 
> I though it is not typical case. Am I wrong?
> if so, I'm sorry. I'm not XFS expert.

I think in the typical case it's the last reference.  The are two
reasons why it might not be:

 - we're on a filesystem with block size < page size in which case two
   buffers can share a page and we'd need to write out and release both
   buffers to free the page
 - someone else might have a reference on the buffer.  Offhand I can't
   remember a place where we do this for delayed write buffers (which
   is what xfsbufd writes out) as it would be a bit against the purpose
   of those delayed write buffers.


^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 0/7] Kill PF_MEMALLOC abuse
@ 2009-11-17 12:47           ` Christoph Hellwig
  0 siblings, 0 replies; 108+ messages in thread
From: Christoph Hellwig @ 2009-11-17 12:47 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: Christoph Hellwig, linux-mm, LKML, Andrew Morton

On Tue, Nov 17, 2009 at 09:24:24PM +0900, KOSAKI Motohiro wrote:
> > On Tue, Nov 17, 2009 at 07:24:42PM +0900, KOSAKI Motohiro wrote:
> > > if xfsbufd doesn't only write out dirty data but also drop page,
> > > I agree you. 
> > 
> > It then drops the reference to the buffer which drops references to the
> > pages, which often are the last references, yes.
> 
> I though it is not typical case. Am I wrong?
> if so, I'm sorry. I'm not XFS expert.

I think in the typical case it's the last reference.  The are two
reasons why it might not be:

 - we're on a filesystem with block size < page size in which case two
   buffers can share a page and we'd need to write out and release both
   buffers to free the page
 - someone else might have a reference on the buffer.  Offhand I can't
   remember a place where we do this for delayed write buffers (which
   is what xfsbufd writes out) as it would be a bit against the purpose
   of those delayed write buffers.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 6/7] cifs: Don't use PF_MEMALLOC
  2009-11-17  7:22   ` KOSAKI Motohiro
@ 2009-11-17 12:47     ` Jeff Layton
  -1 siblings, 0 replies; 108+ messages in thread
From: Jeff Layton @ 2009-11-17 12:47 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: LKML, samba-technical, Steve French, linux-mm, kosaki.motohiro,
	Andrew Morton, linux-cifs-client

On Tue, 17 Nov 2009 16:22:32 +0900 (JST)
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:

> 
> Non MM subsystem must not use PF_MEMALLOC. Memory reclaim need few
> memory, anyone must not prevent it. Otherwise the system cause
> mysterious hang-up and/or OOM Killer invokation.
> 
> Cc: Steve French <sfrench@samba.org>
> Cc: linux-cifs-client@lists.samba.org
> Cc: samba-technical@lists.samba.org
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> ---
>  fs/cifs/connect.c |    1 -
>  1 files changed, 0 insertions(+), 1 deletions(-)
> 
> diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
> index 63ea83f..f9b1553 100644
> --- a/fs/cifs/connect.c
> +++ b/fs/cifs/connect.c
> @@ -337,7 +337,6 @@ cifs_demultiplex_thread(struct TCP_Server_Info *server)
>  	bool isMultiRsp;
>  	int reconnect;
>  
> -	current->flags |= PF_MEMALLOC;
>  	cFYI(1, ("Demultiplex PID: %d", task_pid_nr(current)));
>  
>  	length = atomic_inc_return(&tcpSesAllocCount);

This patch appears to be safe for CIFS. I believe that the demultiplex
thread only does mempool allocations currently. The only other case
where it did an allocation was recently changed with the conversion of
the oplock break code to use slow_work.

Barring anything I've missed...

Acked-by: Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 6/7] cifs: Don't use PF_MEMALLOC
@ 2009-11-17 12:47     ` Jeff Layton
  0 siblings, 0 replies; 108+ messages in thread
From: Jeff Layton @ 2009-11-17 12:47 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: LKML, samba-technical, Steve French, linux-mm, Andrew Morton,
	linux-cifs-client

On Tue, 17 Nov 2009 16:22:32 +0900 (JST)
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:

> 
> Non MM subsystem must not use PF_MEMALLOC. Memory reclaim need few
> memory, anyone must not prevent it. Otherwise the system cause
> mysterious hang-up and/or OOM Killer invokation.
> 
> Cc: Steve French <sfrench@samba.org>
> Cc: linux-cifs-client@lists.samba.org
> Cc: samba-technical@lists.samba.org
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> ---
>  fs/cifs/connect.c |    1 -
>  1 files changed, 0 insertions(+), 1 deletions(-)
> 
> diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
> index 63ea83f..f9b1553 100644
> --- a/fs/cifs/connect.c
> +++ b/fs/cifs/connect.c
> @@ -337,7 +337,6 @@ cifs_demultiplex_thread(struct TCP_Server_Info *server)
>  	bool isMultiRsp;
>  	int reconnect;
>  
> -	current->flags |= PF_MEMALLOC;
>  	cFYI(1, ("Demultiplex PID: %d", task_pid_nr(current)));
>  
>  	length = atomic_inc_return(&tcpSesAllocCount);

This patch appears to be safe for CIFS. I believe that the demultiplex
thread only does mempool allocations currently. The only other case
where it did an allocation was recently changed with the conversion of
the oplock break code to use slow_work.

Barring anything I've missed...

Acked-by: Jeff Layton <jlayton@redhat.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 2/7] mmc: Don't use PF_MEMALLOC
  2009-11-17 11:58       ` KOSAKI Motohiro
@ 2009-11-17 12:51         ` Minchan Kim
  -1 siblings, 0 replies; 108+ messages in thread
From: Minchan Kim @ 2009-11-17 12:51 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: Alan Cox, LKML, linux-mm, Andrew Morton, linux-mmc

Sorry for the noise. 
While I am typing, my mail client already send the mail. :(.
This is genuine.

KOSAKI Motohiro wrote:
>> On Tue, 17 Nov 2009 16:17:50 +0900 (JST)
>> KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:
>>
>>> Non MM subsystem must not use PF_MEMALLOC. Memory reclaim need few
>>> memory, anyone must not prevent it. Otherwise the system cause
>>> mysterious hang-up and/or OOM Killer invokation.
>> So now what happens if we are paging and all our memory is tied up for
>> writeback to a device or CIFS etc which can no longer allocate the memory
>> to complete the write out so the MM can reclaim ?
> 
> Probably my answer is not so simple. sorry.
> 
> reason1: MM reclaim does both dropping clean memory and writing out dirty pages.

Who write out dirty pages?
If block driver can't allocate pages for flushing, It means VM can't reclaim
dirty pages after all.

> reason2: if all memory is exhausted, maybe we can't recover it. it is
> fundamental limitation of Virtual Memory subsystem. and, min-watermark is
> decided by number of system physcal memory, but # of I/O issue (i.e. # of
> pages of used by writeback thread) is mainly decided # of devices. 
> then, we can't gurantee min-watermark is sufficient on any systems.
> Only reasonable solution is mempool like reservation, I think.

I think it's because mempool reserves memory. 
(# of I/O issue\0 is hard to be expected.
How do we determine mempool size of each block driver?
For example,  maybe, server use few I/O for nand. 
but embedded system uses a lot of I/O. 

We need another knob for each block driver?

I understand your point. but it's not simple. 
I think, for making sure VM's pages, block driver need to distinguish 
normal flush path and flush patch for reclaiming. 
So In case of flushing for reclaiming, block driver have to set PF_MEMALLOC. 
otherwise, it shouldn't set PF_MEMALLOC.


> IOW, any reservation memory shouldn't share unrelated subsystem. otherwise
> we lost any gurantee.
> 
> So, I think we need to hear why many developer don't use mempool,
> instead use PF_MEMALLOC.
> 
>> Am I missing something or is this patch set not addressing the case where
>> the writeback thread needs to inherit PF_MEMALLOC somehow (at least for
>> the I/O in question and those blocking it)
> 
> Yes, probably my patchset isn't perfect. honestly I haven't understand
> why so many developer prefer to use PF_MEMALLOC.
> 
> 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>


^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 2/7] mmc: Don't use PF_MEMALLOC
@ 2009-11-17 12:51         ` Minchan Kim
  0 siblings, 0 replies; 108+ messages in thread
From: Minchan Kim @ 2009-11-17 12:51 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: Alan Cox, LKML, linux-mm, Andrew Morton, linux-mmc

Sorry for the noise. 
While I am typing, my mail client already send the mail. :(.
This is genuine.

KOSAKI Motohiro wrote:
>> On Tue, 17 Nov 2009 16:17:50 +0900 (JST)
>> KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:
>>
>>> Non MM subsystem must not use PF_MEMALLOC. Memory reclaim need few
>>> memory, anyone must not prevent it. Otherwise the system cause
>>> mysterious hang-up and/or OOM Killer invokation.
>> So now what happens if we are paging and all our memory is tied up for
>> writeback to a device or CIFS etc which can no longer allocate the memory
>> to complete the write out so the MM can reclaim ?
> 
> Probably my answer is not so simple. sorry.
> 
> reason1: MM reclaim does both dropping clean memory and writing out dirty pages.

Who write out dirty pages?
If block driver can't allocate pages for flushing, It means VM can't reclaim
dirty pages after all.

> reason2: if all memory is exhausted, maybe we can't recover it. it is
> fundamental limitation of Virtual Memory subsystem. and, min-watermark is
> decided by number of system physcal memory, but # of I/O issue (i.e. # of
> pages of used by writeback thread) is mainly decided # of devices. 
> then, we can't gurantee min-watermark is sufficient on any systems.
> Only reasonable solution is mempool like reservation, I think.

I think it's because mempool reserves memory. 
(# of I/O issue\0 is hard to be expected.
How do we determine mempool size of each block driver?
For example,  maybe, server use few I/O for nand. 
but embedded system uses a lot of I/O. 

We need another knob for each block driver?

I understand your point. but it's not simple. 
I think, for making sure VM's pages, block driver need to distinguish 
normal flush path and flush patch for reclaiming. 
So In case of flushing for reclaiming, block driver have to set PF_MEMALLOC. 
otherwise, it shouldn't set PF_MEMALLOC.


> IOW, any reservation memory shouldn't share unrelated subsystem. otherwise
> we lost any gurantee.
> 
> So, I think we need to hear why many developer don't use mempool,
> instead use PF_MEMALLOC.
> 
>> Am I missing something or is this patch set not addressing the case where
>> the writeback thread needs to inherit PF_MEMALLOC somehow (at least for
>> the I/O in question and those blocking it)
> 
> Yes, probably my patchset isn't perfect. honestly I haven't understand
> why so many developer prefer to use PF_MEMALLOC.
> 
> 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 1/7] dm: use __GFP_HIGH instead PF_MEMALLOC
  2009-11-17  7:17   ` KOSAKI Motohiro
  (?)
@ 2009-11-17 13:15     ` Alasdair G Kergon
  -1 siblings, 0 replies; 108+ messages in thread
From: Alasdair G Kergon @ 2009-11-17 13:15 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: LKML, linux-mm, Andrew Morton, dm-devel

On Tue, Nov 17, 2009 at 04:17:07PM +0900, KOSAKI Motohiro wrote:
> Non MM subsystem must not use PF_MEMALLOC. Memory reclaim need few
> memory, anyone must not prevent it. Otherwise the system cause
> mysterious hang-up and/or OOM Killer invokation.
 
This code is also on the critical path, for example, if you are swapping
onto a dm device.  (There are ways we could reduce its use further as
not every dm ioctl needs to be on the critical path and the buffer size
could be limited for the ioctls that do.)

But what situations have been causing you trouble?  The OOM killer must
generally avoid killing userspace processes that suspend & resume dm
devices, and there are tight restrictions on what those processes
can do safely between suspending and resuming.

Alasdair


^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 1/7] dm: use __GFP_HIGH instead PF_MEMALLOC
@ 2009-11-17 13:15     ` Alasdair G Kergon
  0 siblings, 0 replies; 108+ messages in thread
From: Alasdair G Kergon @ 2009-11-17 13:15 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: LKML, linux-mm, Andrew Morton, dm-devel

On Tue, Nov 17, 2009 at 04:17:07PM +0900, KOSAKI Motohiro wrote:
> Non MM subsystem must not use PF_MEMALLOC. Memory reclaim need few
> memory, anyone must not prevent it. Otherwise the system cause
> mysterious hang-up and/or OOM Killer invokation.
 
This code is also on the critical path, for example, if you are swapping
onto a dm device.  (There are ways we could reduce its use further as
not every dm ioctl needs to be on the critical path and the buffer size
could be limited for the ioctls that do.)

But what situations have been causing you trouble?  The OOM killer must
generally avoid killing userspace processes that suspend & resume dm
devices, and there are tight restrictions on what those processes
can do safely between suspending and resuming.

Alasdair

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 1/7] dm: use __GFP_HIGH instead PF_MEMALLOC
@ 2009-11-17 13:15     ` Alasdair G Kergon
  0 siblings, 0 replies; 108+ messages in thread
From: Alasdair G Kergon @ 2009-11-17 13:15 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: linux-mm, Andrew Morton, LKML, dm-devel

On Tue, Nov 17, 2009 at 04:17:07PM +0900, KOSAKI Motohiro wrote:
> Non MM subsystem must not use PF_MEMALLOC. Memory reclaim need few
> memory, anyone must not prevent it. Otherwise the system cause
> mysterious hang-up and/or OOM Killer invokation.
 
This code is also on the critical path, for example, if you are swapping
onto a dm device.  (There are ways we could reduce its use further as
not every dm ioctl needs to be on the critical path and the buffer size
could be limited for the ioctls that do.)

But what situations have been causing you trouble?  The OOM killer must
generally avoid killing userspace processes that suspend & resume dm
devices, and there are tight restrictions on what those processes
can do safely between suspending and resuming.

Alasdair

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 6/7] cifs: Don't use PF_MEMALLOC
  2009-11-17 12:47     ` Jeff Layton
  (?)
@ 2009-11-17 16:40     ` Steve French
  2009-11-18  6:31         ` KOSAKI Motohiro
  -1 siblings, 1 reply; 108+ messages in thread
From: Steve French @ 2009-11-17 16:40 UTC (permalink / raw)
  To: Jeff Layton
  Cc: samba-technical, LKML, Steve French, linux-mm, KOSAKI Motohiro,
	Andrew Morton, linux-cifs-client

It is hard to follow exactly what this flag does in /mm (other than try
harder on memory allocations) - I haven't found much about this flag (e.g.
http://lwn.net/Articles/246928/) but it does look like most of the fs no
longer set this (except xfs) e.g. ext3_ordered_writepage.  When running out
of memory in the cifs_demultiplex_thread it will retry 3 seconds later, but
if memory allocations ever fail in this path we could potentially be holding
up (an already issued write in) writepages for that period by not having
memory to get the response to see if the write succeeded.

We pass in few flags for these memory allocation requests: GFP_NOFS (on the
mempool_alloc) and SLAB_HWCACHE_ALIGN (on the kmem_cache_create of the pool)
should we be passing in other flags on the allocations?

On Tue, Nov 17, 2009 at 6:47 AM, Jeff Layton <jlayton@redhat.com> wrote:

> On Tue, 17 Nov 2009 16:22:32 +0900 (JST)
> KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:
>
> >
> > Non MM subsystem must not use PF_MEMALLOC. Memory reclaim need few
> > memory, anyone must not prevent it. Otherwise the system cause
> > mysterious hang-up and/or OOM Killer invokation.
> >
> > Cc: Steve French <sfrench@samba.org>
> > Cc: linux-cifs-client@lists.samba.org
> > Cc: samba-technical@lists.samba.org
> > Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> > ---
> >  fs/cifs/connect.c |    1 -
> >  1 files changed, 0 insertions(+), 1 deletions(-)
> >
> > diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
> > index 63ea83f..f9b1553 100644
> > --- a/fs/cifs/connect.c
> > +++ b/fs/cifs/connect.c
> > @@ -337,7 +337,6 @@ cifs_demultiplex_thread(struct TCP_Server_Info
> *server)
> >       bool isMultiRsp;
> >       int reconnect;
> >
> > -     current->flags |= PF_MEMALLOC;
> >       cFYI(1, ("Demultiplex PID: %d", task_pid_nr(current)));
> >
> >       length = atomic_inc_return(&tcpSesAllocCount);
>
> This patch appears to be safe for CIFS. I believe that the demultiplex
> thread only does mempool allocations currently. The only other case
> where it did an allocation was recently changed with the conversion of
> the oplock break code to use slow_work.
>
> Barring anything I've missed...
>
> Acked-by: Jeff Layton <jlayton@redhat.com>
>



-- 
Thanks,

Steve

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 6/7] cifs: Don't use PF_MEMALLOC
  2009-11-17 12:47     ` Jeff Layton
  (?)
  (?)
@ 2009-11-17 16:41     ` Steve French
  -1 siblings, 0 replies; 108+ messages in thread
From: Steve French @ 2009-11-17 16:41 UTC (permalink / raw)
  To: LKML

It is hard to follow exactly what this flag does in /mm (other than
try harder on memory allocations) - I haven't found much about this
flag (e.g. http://lwn.net/Articles/246928/) but it does look like most
of the fs no longer set this (except xfs) e.g. ext3_ordered_writepage.
 When running out of memory in the cifs_demultiplex_thread it will
retry 3 seconds later, but if memory allocations ever fail in this
path we could potentially be holding up (an already issued write in)
writepages for that period by not having memory to get the response to
see if the write succeeded.

We pass in few flags for these memory allocation requests: GFP_NOFS
(on the mempool_alloc) and SLAB_HWCACHE_ALIGN (on the
kmem_cache_create of the pool) should we be passing in other flags on
the allocations?

On Tue, Nov 17, 2009 at 6:47 AM, Jeff Layton <jlayton@redhat.com> wrote:
>
> On Tue, 17 Nov 2009 16:22:32 +0900 (JST)
> KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:
>
> >
> > Non MM subsystem must not use PF_MEMALLOC. Memory reclaim need few
> > memory, anyone must not prevent it. Otherwise the system cause
> > mysterious hang-up and/or OOM Killer invokation.
> >
> > Cc: Steve French <sfrench@samba.org>
> > Cc: linux-cifs-client@lists.samba.org
> > Cc: samba-technical@lists.samba.org
> > Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> > ---
> >  fs/cifs/connect.c |    1 -
> >  1 files changed, 0 insertions(+), 1 deletions(-)
> >
> > diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
> > index 63ea83f..f9b1553 100644
> > --- a/fs/cifs/connect.c
> > +++ b/fs/cifs/connect.c
> > @@ -337,7 +337,6 @@ cifs_demultiplex_thread(struct TCP_Server_Info *server)
> >       bool isMultiRsp;
> >       int reconnect;
> >
> > -     current->flags |= PF_MEMALLOC;
> >       cFYI(1, ("Demultiplex PID: %d", task_pid_nr(current)));
> >
> >       length = atomic_inc_return(&tcpSesAllocCount);
>
> This patch appears to be safe for CIFS. I believe that the demultiplex
> thread only does mempool allocations currently. The only other case
> where it did an allocation was recently changed with the conversion of
> the oplock break code to use slow_work.
>
> Barring anything I've missed...
>
> Acked-by: Jeff Layton <jlayton@redhat.com>



--
Thanks,

Steve

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 2/7] mmc: Don't use PF_MEMALLOC
  2009-11-17 12:51         ` Minchan Kim
@ 2009-11-17 20:47           ` Peter Zijlstra
  -1 siblings, 0 replies; 108+ messages in thread
From: Peter Zijlstra @ 2009-11-17 20:47 UTC (permalink / raw)
  To: Minchan Kim
  Cc: KOSAKI Motohiro, Alan Cox, LKML, linux-mm, Andrew Morton, linux-mmc

On Tue, 2009-11-17 at 21:51 +0900, Minchan Kim wrote:
> I think it's because mempool reserves memory. 
> (# of I/O issue\0 is hard to be expected.
> How do we determine mempool size of each block driver?
> For example,  maybe, server use few I/O for nand. 
> but embedded system uses a lot of I/O. 

No, you scale the mempool to the minimum amount required to make
progress -- this includes limiting the 'concurrency' when handing out
mempool objects.

If you run into such tight corners often enough to notice it, there's
something else wrong.

I fully agree with ripping out PF_MEMALLOC from pretty much everything,
including the VM, getting rid of the various abuse outside of the VM
seems like a very good start.




^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 2/7] mmc: Don't use PF_MEMALLOC
@ 2009-11-17 20:47           ` Peter Zijlstra
  0 siblings, 0 replies; 108+ messages in thread
From: Peter Zijlstra @ 2009-11-17 20:47 UTC (permalink / raw)
  To: Minchan Kim
  Cc: KOSAKI Motohiro, Alan Cox, LKML, linux-mm, Andrew Morton, linux-mmc

On Tue, 2009-11-17 at 21:51 +0900, Minchan Kim wrote:
> I think it's because mempool reserves memory. 
> (# of I/O issue\0 is hard to be expected.
> How do we determine mempool size of each block driver?
> For example,  maybe, server use few I/O for nand. 
> but embedded system uses a lot of I/O. 

No, you scale the mempool to the minimum amount required to make
progress -- this includes limiting the 'concurrency' when handing out
mempool objects.

If you run into such tight corners often enough to notice it, there's
something else wrong.

I fully agree with ripping out PF_MEMALLOC from pretty much everything,
including the VM, getting rid of the various abuse outside of the VM
seems like a very good start.



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 0/7] Kill PF_MEMALLOC abuse
  2009-11-17  8:33     ` KOSAKI Motohiro
@ 2009-11-17 20:56       ` Peter Zijlstra
  -1 siblings, 0 replies; 108+ messages in thread
From: Peter Zijlstra @ 2009-11-17 20:56 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: David Rientjes, linux-mm, LKML, Andrew Morton

On Tue, 2009-11-17 at 17:33 +0900, KOSAKI Motohiro wrote:
> 
> if there is so such reason. we might need to implement another MM trick.
> but keeping this strage usage is not a option. All memory freeing activity
> (e.g. page out, task killing) need some memory. we need to protect its
> emergency memory. otherwise linux reliability decrease dramatically when
> the system face to memory stress. 

In general PF_MEMALLOC is a particularly bad idea, even for the VM when
not coupled with limiting the consumption. That is one should make an
upper-bound estimation of the memory needed for a writeout-path per
page, and reserve a small multiple thereof, and limit the number of
pages written out so as to never exceed this estimate.

If the current mempool interface isn't sufficient (not hard to imagine),
look at the swap over NFS patch-set, that includes a much more able
reservation scheme, and accounting framework.




^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 0/7] Kill PF_MEMALLOC abuse
@ 2009-11-17 20:56       ` Peter Zijlstra
  0 siblings, 0 replies; 108+ messages in thread
From: Peter Zijlstra @ 2009-11-17 20:56 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: David Rientjes, linux-mm, LKML, Andrew Morton

On Tue, 2009-11-17 at 17:33 +0900, KOSAKI Motohiro wrote:
> 
> if there is so such reason. we might need to implement another MM trick.
> but keeping this strage usage is not a option. All memory freeing activity
> (e.g. page out, task killing) need some memory. we need to protect its
> emergency memory. otherwise linux reliability decrease dramatically when
> the system face to memory stress. 

In general PF_MEMALLOC is a particularly bad idea, even for the VM when
not coupled with limiting the consumption. That is one should make an
upper-bound estimation of the memory needed for a writeout-path per
page, and reserve a small multiple thereof, and limit the number of
pages written out so as to never exceed this estimate.

If the current mempool interface isn't sufficient (not hard to imagine),
look at the swap over NFS patch-set, that includes a much more able
reservation scheme, and accounting framework.



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 7/7] xfs: Don't use PF_MEMALLOC
  2009-11-17  7:23   ` KOSAKI Motohiro
  (?)
  (?)
@ 2009-11-17 22:11     ` Dave Chinner
  -1 siblings, 0 replies; 108+ messages in thread
From: Dave Chinner @ 2009-11-17 22:11 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: LKML, linux-mm, Andrew Morton, Christoph Hellwig, linux-fsdevel,
	xfs-masters, xfs

On Tue, Nov 17, 2009 at 04:23:43PM +0900, KOSAKI Motohiro wrote:
> 
> Non MM subsystem must not use PF_MEMALLOC. Memory reclaim need few
> memory, anyone must not prevent it. Otherwise the system cause
> mysterious hang-up and/or OOM Killer invokation.

The xfsbufd is a woken run by a registered memory shaker. i.e. it
runs when the system needs to reclaim memory. It forceѕ the
delayed write metadata buffers (of which there can be a lot) to disk
so that they can be reclaimed on IO completion. This IO submission
may require ѕome memory to be allocated to be able to free that
memory.

Hence, AFAICT the use of PF_MEMALLOC is valid here.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 7/7] xfs: Don't use PF_MEMALLOC
@ 2009-11-17 22:11     ` Dave Chinner
  0 siblings, 0 replies; 108+ messages in thread
From: Dave Chinner @ 2009-11-17 22:11 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: LKML, linux-mm, Andrew Morton, Christoph Hellwig, linux-fsdevel,
	xfs-masters, xfs

On Tue, Nov 17, 2009 at 04:23:43PM +0900, KOSAKI Motohiro wrote:
> 
> Non MM subsystem must not use PF_MEMALLOC. Memory reclaim need few
> memory, anyone must not prevent it. Otherwise the system cause
> mysterious hang-up and/or OOM Killer invokation.

The xfsbufd is a woken run by a registered memory shaker. i.e. it
runs when the system needs to reclaim memory. It forceѕ the
delayed write metadata buffers (of which there can be a lot) to disk
so that they can be reclaimed on IO completion. This IO submission
may require ѕome memory to be allocated to be able to free that
memory.

Hence, AFAICT the use of PF_MEMALLOC is valid here.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 7/7] xfs: Don't use PF_MEMALLOC
@ 2009-11-17 22:11     ` Dave Chinner
  0 siblings, 0 replies; 108+ messages in thread
From: Dave Chinner @ 2009-11-17 22:11 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: xfs-masters, LKML, xfs, Christoph Hellwig, linux-mm,
	linux-fsdevel, Andrew Morton

On Tue, Nov 17, 2009 at 04:23:43PM +0900, KOSAKI Motohiro wrote:
> 
> Non MM subsystem must not use PF_MEMALLOC. Memory reclaim need few
> memory, anyone must not prevent it. Otherwise the system cause
> mysterious hang-up and/or OOM Killer invokation.

The xfsbufd is a woken run by a registered memory shaker. i.e. it
runs when the system needs to reclaim memory. It forceѕ the
delayed write metadata buffers (of which there can be a lot) to disk
so that they can be reclaimed on IO completion. This IO submission
may require ѕome memory to be allocated to be able to free that
memory.

Hence, AFAICT the use of PF_MEMALLOC is valid here.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 7/7] xfs: Don't use PF_MEMALLOC
@ 2009-11-17 22:11     ` Dave Chinner
  0 siblings, 0 replies; 108+ messages in thread
From: Dave Chinner @ 2009-11-17 22:11 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: LKML, linux-mm, Andrew Morton, Christoph Hellwig, linux-fsdevel,
	xfs-masters, xfs

On Tue, Nov 17, 2009 at 04:23:43PM +0900, KOSAKI Motohiro wrote:
> 
> Non MM subsystem must not use PF_MEMALLOC. Memory reclaim need few
> memory, anyone must not prevent it. Otherwise the system cause
> mysterious hang-up and/or OOM Killer invokation.

The xfsbufd is a woken run by a registered memory shaker. i.e. it
runs when the system needs to reclaim memory. It forceN? the
delayed write metadata buffers (of which there can be a lot) to disk
so that they can be reclaimed on IO completion. This IO submission
may require N?ome memory to be allocated to be able to free that
memory.

Hence, AFAICT the use of PF_MEMALLOC is valid here.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 2/7] mmc: Don't use PF_MEMALLOC
  2009-11-17 20:47           ` Peter Zijlstra
@ 2009-11-18  0:01             ` Minchan Kim
  -1 siblings, 0 replies; 108+ messages in thread
From: Minchan Kim @ 2009-11-18  0:01 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: KOSAKI Motohiro, Alan Cox, LKML, linux-mm, Andrew Morton, linux-mmc

Hi, Peter.

First of all, Thanks for the commenting.

On Wed, Nov 18, 2009 at 5:47 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Tue, 2009-11-17 at 21:51 +0900, Minchan Kim wrote:
>> I think it's because mempool reserves memory.
>> (# of I/O issue\0 is hard to be expected.
>> How do we determine mempool size of each block driver?
>> For example,  maybe, server use few I/O for nand.
>> but embedded system uses a lot of I/O.
>
> No, you scale the mempool to the minimum amount required to make
> progress -- this includes limiting the 'concurrency' when handing out
> mempool objects.
>
> If you run into such tight corners often enough to notice it, there's
> something else wrong.
>
> I fully agree with ripping out PF_MEMALLOC from pretty much everything,
> including the VM, getting rid of the various abuse outside of the VM
> seems like a very good start.
>

I am not against removing PF_MEMALLOC.
Totally, I agree to prevent abusing of PF_MEMALLOC.

What I have a concern is per-block mempool.
Although it's minimum amount of mempool, it can be increased
by adding new block driver. I am not sure how many we will have block driver.

And, person who develop new driver always have to use mempool and consider
what is minimum of mempool.
I think this is a problem of mempool, now.

How about this?
According to system memory, kernel have just one mempool for I/O which
is one shared by several block driver.

And we make new API block driver can use.
Of course, as usual It can use dynamic memoy. Only it can use mempool if
system don't have much dynamic memory.

In this case, we can control read/write path. read I/O can't help
memory reclaiming.
So I think read I/O don't use mempool, I am not sure. :)


-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 2/7] mmc: Don't use PF_MEMALLOC
@ 2009-11-18  0:01             ` Minchan Kim
  0 siblings, 0 replies; 108+ messages in thread
From: Minchan Kim @ 2009-11-18  0:01 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: KOSAKI Motohiro, Alan Cox, LKML, linux-mm, Andrew Morton, linux-mmc

Hi, Peter.

First of all, Thanks for the commenting.

On Wed, Nov 18, 2009 at 5:47 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Tue, 2009-11-17 at 21:51 +0900, Minchan Kim wrote:
>> I think it's because mempool reserves memory.
>> (# of I/O issue\0 is hard to be expected.
>> How do we determine mempool size of each block driver?
>> For example,  maybe, server use few I/O for nand.
>> but embedded system uses a lot of I/O.
>
> No, you scale the mempool to the minimum amount required to make
> progress -- this includes limiting the 'concurrency' when handing out
> mempool objects.
>
> If you run into such tight corners often enough to notice it, there's
> something else wrong.
>
> I fully agree with ripping out PF_MEMALLOC from pretty much everything,
> including the VM, getting rid of the various abuse outside of the VM
> seems like a very good start.
>

I am not against removing PF_MEMALLOC.
Totally, I agree to prevent abusing of PF_MEMALLOC.

What I have a concern is per-block mempool.
Although it's minimum amount of mempool, it can be increased
by adding new block driver. I am not sure how many we will have block driver.

And, person who develop new driver always have to use mempool and consider
what is minimum of mempool.
I think this is a problem of mempool, now.

How about this?
According to system memory, kernel have just one mempool for I/O which
is one shared by several block driver.

And we make new API block driver can use.
Of course, as usual It can use dynamic memoy. Only it can use mempool if
system don't have much dynamic memory.

In this case, we can control read/write path. read I/O can't help
memory reclaiming.
So I think read I/O don't use mempool, I am not sure. :)


-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 0/7] Kill PF_MEMALLOC abuse
  2009-11-17 20:56       ` Peter Zijlstra
@ 2009-11-18  5:55         ` KOSAKI Motohiro
  -1 siblings, 0 replies; 108+ messages in thread
From: KOSAKI Motohiro @ 2009-11-18  5:55 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: kosaki.motohiro, David Rientjes, linux-mm, LKML, Andrew Morton

> On Tue, 2009-11-17 at 17:33 +0900, KOSAKI Motohiro wrote:
> > 
> > if there is so such reason. we might need to implement another MM trick.
> > but keeping this strage usage is not a option. All memory freeing activity
> > (e.g. page out, task killing) need some memory. we need to protect its
> > emergency memory. otherwise linux reliability decrease dramatically when
> > the system face to memory stress. 
> 
> In general PF_MEMALLOC is a particularly bad idea, even for the VM when
> not coupled with limiting the consumption. That is one should make an
> upper-bound estimation of the memory needed for a writeout-path per
> page, and reserve a small multiple thereof, and limit the number of
> pages written out so as to never exceed this estimate.
> 
> If the current mempool interface isn't sufficient (not hard to imagine),
> look at the swap over NFS patch-set, that includes a much more able
> reservation scheme, and accounting framework.

Yes, I agree.

In this discussion, some people explained why their subsystem need
emergency memory, but nobody claim sharing memory pool against VM and
surely want to stop reclaim (PF_MEMALLOC's big side effect).

OK. I try to review your patch carefully and remake this patch series on top
your reservation framework in swap-over-nfs patch series.




^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 0/7] Kill PF_MEMALLOC abuse
@ 2009-11-18  5:55         ` KOSAKI Motohiro
  0 siblings, 0 replies; 108+ messages in thread
From: KOSAKI Motohiro @ 2009-11-18  5:55 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: kosaki.motohiro, David Rientjes, linux-mm, LKML, Andrew Morton

> On Tue, 2009-11-17 at 17:33 +0900, KOSAKI Motohiro wrote:
> > 
> > if there is so such reason. we might need to implement another MM trick.
> > but keeping this strage usage is not a option. All memory freeing activity
> > (e.g. page out, task killing) need some memory. we need to protect its
> > emergency memory. otherwise linux reliability decrease dramatically when
> > the system face to memory stress. 
> 
> In general PF_MEMALLOC is a particularly bad idea, even for the VM when
> not coupled with limiting the consumption. That is one should make an
> upper-bound estimation of the memory needed for a writeout-path per
> page, and reserve a small multiple thereof, and limit the number of
> pages written out so as to never exceed this estimate.
> 
> If the current mempool interface isn't sufficient (not hard to imagine),
> look at the swap over NFS patch-set, that includes a much more able
> reservation scheme, and accounting framework.

Yes, I agree.

In this discussion, some people explained why their subsystem need
emergency memory, but nobody claim sharing memory pool against VM and
surely want to stop reclaim (PF_MEMALLOC's big side effect).

OK. I try to review your patch carefully and remake this patch series on top
your reservation framework in swap-over-nfs patch series.



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 1/7] dm: use __GFP_HIGH instead PF_MEMALLOC
  2009-11-17 13:15     ` Alasdair G Kergon
@ 2009-11-18  6:17       ` KOSAKI Motohiro
  -1 siblings, 0 replies; 108+ messages in thread
From: KOSAKI Motohiro @ 2009-11-18  6:17 UTC (permalink / raw)
  To: KOSAKI Motohiro, LKML, linux-mm, Andrew Morton, dm-devel; +Cc: kosaki.motohiro

Hi,

Thank you for give me comment.

> On Tue, Nov 17, 2009 at 04:17:07PM +0900, KOSAKI Motohiro wrote:
> > Non MM subsystem must not use PF_MEMALLOC. Memory reclaim need few
> > memory, anyone must not prevent it. Otherwise the system cause
> > mysterious hang-up and/or OOM Killer invokation.
>  
> This code is also on the critical path, for example, if you are swapping
> onto a dm device.  (There are ways we could reduce its use further as
> not every dm ioctl needs to be on the critical path and the buffer size
> could be limited for the ioctls that do.)

May I ask one additional question?
Original code is here.

	-------------------------------------------------------
        /*
         * Trying to avoid low memory issues when a device is
         * suspended.
         */
        current->flags |= PF_MEMALLOC;

        /*
         * Copy the parameters into kernel space.
         */
        r = copy_params(user, &param);

        current->flags &= ~PF_MEMALLOC;
	-------------------------------------------------------

but PF_MEMALLOC doesn't gurantee allocation successfull. In your case,
mempoll seems better to me. copy_params seems enough small function 
and we can rewrite it. Why didn't you use mempool?

Am I missing something?


> But what situations have been causing you trouble?  The OOM killer must
> generally avoid killing userspace processes that suspend & resume dm
> devices, and there are tight restrictions on what those processes
> can do safely between suspending and resuming.

No. This is theorical issue. but I really want to avoid stress weakness
kernel.






^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 1/7] dm: use __GFP_HIGH instead PF_MEMALLOC
@ 2009-11-18  6:17       ` KOSAKI Motohiro
  0 siblings, 0 replies; 108+ messages in thread
From: KOSAKI Motohiro @ 2009-11-18  6:17 UTC (permalink / raw)
  To: KOSAKI Motohiro, LKML, linux-mm, Andrew Morton, dm-devel

Hi,

Thank you for give me comment.

> On Tue, Nov 17, 2009 at 04:17:07PM +0900, KOSAKI Motohiro wrote:
> > Non MM subsystem must not use PF_MEMALLOC. Memory reclaim need few
> > memory, anyone must not prevent it. Otherwise the system cause
> > mysterious hang-up and/or OOM Killer invokation.
>  
> This code is also on the critical path, for example, if you are swapping
> onto a dm device.  (There are ways we could reduce its use further as
> not every dm ioctl needs to be on the critical path and the buffer size
> could be limited for the ioctls that do.)

May I ask one additional question?
Original code is here.

	-------------------------------------------------------
        /*
         * Trying to avoid low memory issues when a device is
         * suspended.
         */
        current->flags |= PF_MEMALLOC;

        /*
         * Copy the parameters into kernel space.
         */
        r = copy_params(user, &param);

        current->flags &= ~PF_MEMALLOC;
	-------------------------------------------------------

but PF_MEMALLOC doesn't gurantee allocation successfull. In your case,
mempoll seems better to me. copy_params seems enough small function 
and we can rewrite it. Why didn't you use mempool?

Am I missing something?


> But what situations have been causing you trouble?  The OOM killer must
> generally avoid killing userspace processes that suspend & resume dm
> devices, and there are tight restrictions on what those processes
> can do safely between suspending and resuming.

No. This is theorical issue. but I really want to avoid stress weakness
kernel.





--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 1/7] dm: use __GFP_HIGH instead PF_MEMALLOC
  2009-11-17 13:15     ` Alasdair G Kergon
  (?)
  (?)
@ 2009-11-18  6:17     ` KOSAKI Motohiro
  -1 siblings, 0 replies; 108+ messages in thread
From: KOSAKI Motohiro @ 2009-11-18  6:17 UTC (permalink / raw)
  To: LKML, linux-mm, Andrew Morton, dm-devel; +Cc: kosaki.motohiro

Hi,

Thank you for give me comment.

> On Tue, Nov 17, 2009 at 04:17:07PM +0900, KOSAKI Motohiro wrote:
> > Non MM subsystem must not use PF_MEMALLOC. Memory reclaim need few
> > memory, anyone must not prevent it. Otherwise the system cause
> > mysterious hang-up and/or OOM Killer invokation.
>  
> This code is also on the critical path, for example, if you are swapping
> onto a dm device.  (There are ways we could reduce its use further as
> not every dm ioctl needs to be on the critical path and the buffer size
> could be limited for the ioctls that do.)

May I ask one additional question?
Original code is here.

	-------------------------------------------------------
        /*
         * Trying to avoid low memory issues when a device is
         * suspended.
         */
        current->flags |= PF_MEMALLOC;

        /*
         * Copy the parameters into kernel space.
         */
        r = copy_params(user, &param);

        current->flags &= ~PF_MEMALLOC;
	-------------------------------------------------------

but PF_MEMALLOC doesn't gurantee allocation successfull. In your case,
mempoll seems better to me. copy_params seems enough small function 
and we can rewrite it. Why didn't you use mempool?

Am I missing something?


> But what situations have been causing you trouble?  The OOM killer must
> generally avoid killing userspace processes that suspend & resume dm
> devices, and there are tight restrictions on what those processes
> can do safely between suspending and resuming.

No. This is theorical issue. but I really want to avoid stress weakness
kernel.

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 6/7] cifs: Don't use PF_MEMALLOC
  2009-11-17 16:40     ` Steve French
@ 2009-11-18  6:31         ` KOSAKI Motohiro
  0 siblings, 0 replies; 108+ messages in thread
From: KOSAKI Motohiro @ 2009-11-18  6:31 UTC (permalink / raw)
  To: Steve French
  Cc: kosaki.motohiro, Jeff Layton, LKML, samba-technical,
	Steve French, linux-mm, Andrew Morton, linux-cifs-client

> It is hard to follow exactly what this flag does in /mm (other than try
> harder on memory allocations) - I haven't found much about this flag (e.g.
> http://lwn.net/Articles/246928/) but it does look like most of the fs no
> longer set this (except xfs) e.g. ext3_ordered_writepage.  When running out
> of memory in the cifs_demultiplex_thread it will retry 3 seconds later, but
> if memory allocations ever fail in this path we could potentially be holding
> up (an already issued write in) writepages for that period by not having
> memory to get the response to see if the write succeeded.
> 
> We pass in few flags for these memory allocation requests: GFP_NOFS (on the
> mempool_alloc) and SLAB_HWCACHE_ALIGN (on the kmem_cache_create of the pool)
> should we be passing in other flags on the allocations?

I don't think you  need change more.




^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 6/7] cifs: Don't use PF_MEMALLOC
@ 2009-11-18  6:31         ` KOSAKI Motohiro
  0 siblings, 0 replies; 108+ messages in thread
From: KOSAKI Motohiro @ 2009-11-18  6:31 UTC (permalink / raw)
  To: Steve French
  Cc: kosaki.motohiro, Jeff Layton, LKML, samba-technical,
	Steve French, linux-mm, Andrew Morton, linux-cifs-client

> It is hard to follow exactly what this flag does in /mm (other than try
> harder on memory allocations) - I haven't found much about this flag (e.g.
> http://lwn.net/Articles/246928/) but it does look like most of the fs no
> longer set this (except xfs) e.g. ext3_ordered_writepage.  When running out
> of memory in the cifs_demultiplex_thread it will retry 3 seconds later, but
> if memory allocations ever fail in this path we could potentially be holding
> up (an already issued write in) writepages for that period by not having
> memory to get the response to see if the write succeeded.
> 
> We pass in few flags for these memory allocation requests: GFP_NOFS (on the
> mempool_alloc) and SLAB_HWCACHE_ALIGN (on the kmem_cache_create of the pool)
> should we be passing in other flags on the allocations?

I don't think you  need change more.



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 7/7] xfs: Don't use PF_MEMALLOC
  2009-11-17 22:11     ` Dave Chinner
  (?)
  (?)
@ 2009-11-18  8:56       ` KOSAKI Motohiro
  -1 siblings, 0 replies; 108+ messages in thread
From: KOSAKI Motohiro @ 2009-11-18  8:56 UTC (permalink / raw)
  To: Dave Chinner
  Cc: kosaki.motohiro, LKML, linux-mm, Andrew Morton,
	Christoph Hellwig, linux-fsdevel, xfs-masters, xfs

> On Tue, Nov 17, 2009 at 04:23:43PM +0900, KOSAKI Motohiro wrote:
> > 
> > Non MM subsystem must not use PF_MEMALLOC. Memory reclaim need few
> > memory, anyone must not prevent it. Otherwise the system cause
> > mysterious hang-up and/or OOM Killer invokation.
> 
> The xfsbufd is a woken run by a registered memory shaker. i.e. it
> runs when the system needs to reclaim memory. It forceѕ the
> delayed write metadata buffers (of which there can be a lot) to disk
> so that they can be reclaimed on IO completion. This IO submission
> may require ѕome memory to be allocated to be able to free that
> memory.
> 
> Hence, AFAICT the use of PF_MEMALLOC is valid here.

Thanks a lot. 
I have one additional question, may I ask you?

How can we calculate maximum memory usage in xfsbufd?
I'm afraid that VM and XFS works properly but adding two makes memory exhaust.

And, I conclude XFS doesn't need sharing reservation memory with VM,
it only need non failed allocation. right? IOW I'm prefer perter's
suggestion.



^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 7/7] xfs: Don't use PF_MEMALLOC
@ 2009-11-18  8:56       ` KOSAKI Motohiro
  0 siblings, 0 replies; 108+ messages in thread
From: KOSAKI Motohiro @ 2009-11-18  8:56 UTC (permalink / raw)
  To: Dave Chinner
  Cc: kosaki.motohiro, LKML, linux-mm, Andrew Morton,
	Christoph Hellwig, linux-fsdevel, xfs-masters, xfs

> On Tue, Nov 17, 2009 at 04:23:43PM +0900, KOSAKI Motohiro wrote:
> > 
> > Non MM subsystem must not use PF_MEMALLOC. Memory reclaim need few
> > memory, anyone must not prevent it. Otherwise the system cause
> > mysterious hang-up and/or OOM Killer invokation.
> 
> The xfsbufd is a woken run by a registered memory shaker. i.e. it
> runs when the system needs to reclaim memory. It forceѕ the
> delayed write metadata buffers (of which there can be a lot) to disk
> so that they can be reclaimed on IO completion. This IO submission
> may require ѕome memory to be allocated to be able to free that
> memory.
> 
> Hence, AFAICT the use of PF_MEMALLOC is valid here.

Thanks a lot. 
I have one additional question, may I ask you?

How can we calculate maximum memory usage in xfsbufd?
I'm afraid that VM and XFS works properly but adding two makes memory exhaust.

And, I conclude XFS doesn't need sharing reservation memory with VM,
it only need non failed allocation. right? IOW I'm prefer perter's
suggestion.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 7/7] xfs: Don't use PF_MEMALLOC
@ 2009-11-18  8:56       ` KOSAKI Motohiro
  0 siblings, 0 replies; 108+ messages in thread
From: KOSAKI Motohiro @ 2009-11-18  8:56 UTC (permalink / raw)
  To: Dave Chinner
  Cc: xfs-masters, LKML, xfs, Christoph Hellwig, linux-mm,
	kosaki.motohiro, linux-fsdevel, Andrew Morton

> On Tue, Nov 17, 2009 at 04:23:43PM +0900, KOSAKI Motohiro wrote:
> > 
> > Non MM subsystem must not use PF_MEMALLOC. Memory reclaim need few
> > memory, anyone must not prevent it. Otherwise the system cause
> > mysterious hang-up and/or OOM Killer invokation.
> 
> The xfsbufd is a woken run by a registered memory shaker. i.e. it
> runs when the system needs to reclaim memory. It forceѕ the
> delayed write metadata buffers (of which there can be a lot) to disk
> so that they can be reclaimed on IO completion. This IO submission
> may require ѕome memory to be allocated to be able to free that
> memory.
> 
> Hence, AFAICT the use of PF_MEMALLOC is valid here.

Thanks a lot. 
I have one additional question, may I ask you?

How can we calculate maximum memory usage in xfsbufd?
I'm afraid that VM and XFS works properly but adding two makes memory exhaust.

And, I conclude XFS doesn't need sharing reservation memory with VM,
it only need non failed allocation. right? IOW I'm prefer perter's
suggestion.


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 7/7] xfs: Don't use PF_MEMALLOC
@ 2009-11-18  8:56       ` KOSAKI Motohiro
  0 siblings, 0 replies; 108+ messages in thread
From: KOSAKI Motohiro @ 2009-11-18  8:56 UTC (permalink / raw)
  To: Dave Chinner
  Cc: kosaki.motohiro, LKML, linux-mm, Andrew Morton,
	Christoph Hellwig, linux-fsdevel, xfs-masters, xfs

> On Tue, Nov 17, 2009 at 04:23:43PM +0900, KOSAKI Motohiro wrote:
> > 
> > Non MM subsystem must not use PF_MEMALLOC. Memory reclaim need few
> > memory, anyone must not prevent it. Otherwise the system cause
> > mysterious hang-up and/or OOM Killer invokation.
> 
> The xfsbufd is a woken run by a registered memory shaker. i.e. it
> runs when the system needs to reclaim memory. It forceN? the
> delayed write metadata buffers (of which there can be a lot) to disk
> so that they can be reclaimed on IO completion. This IO submission
> may require N?ome memory to be allocated to be able to free that
> memory.
> 
> Hence, AFAICT the use of PF_MEMALLOC is valid here.

Thanks a lot. 
I have one additional question, may I ask you?

How can we calculate maximum memory usage in xfsbufd?
I'm afraid that VM and XFS works properly but adding two makes memory exhaust.

And, I conclude XFS doesn't need sharing reservation memory with VM,
it only need non failed allocation. right? IOW I'm prefer perter's
suggestion.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 2/7] mmc: Don't use PF_MEMALLOC
  2009-11-18  0:01             ` Minchan Kim
@ 2009-11-18  9:56               ` Peter Zijlstra
  -1 siblings, 0 replies; 108+ messages in thread
From: Peter Zijlstra @ 2009-11-18  9:56 UTC (permalink / raw)
  To: Minchan Kim
  Cc: KOSAKI Motohiro, Alan Cox, LKML, linux-mm, Andrew Morton, linux-mmc

On Wed, 2009-11-18 at 09:01 +0900, Minchan Kim wrote:
> Hi, Peter.
> 
> First of all, Thanks for the commenting.
> 
> On Wed, Nov 18, 2009 at 5:47 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> > On Tue, 2009-11-17 at 21:51 +0900, Minchan Kim wrote:
> >> I think it's because mempool reserves memory.
> >> (# of I/O issue\0 is hard to be expected.
> >> How do we determine mempool size of each block driver?
> >> For example,  maybe, server use few I/O for nand.
> >> but embedded system uses a lot of I/O.
> >
> > No, you scale the mempool to the minimum amount required to make
> > progress -- this includes limiting the 'concurrency' when handing out
> > mempool objects.
> >
> > If you run into such tight corners often enough to notice it, there's
> > something else wrong.
> >
> > I fully agree with ripping out PF_MEMALLOC from pretty much everything,
> > including the VM, getting rid of the various abuse outside of the VM
> > seems like a very good start.
> >
> 
> I am not against removing PF_MEMALLOC.
> Totally, I agree to prevent abusing of PF_MEMALLOC.
> 
> What I have a concern is per-block mempool.
> Although it's minimum amount of mempool, it can be increased
> by adding new block driver. I am not sure how many we will have block driver.
> 
> And, person who develop new driver always have to use mempool and consider
> what is minimum of mempool.
> I think this is a problem of mempool, now.
> 
> How about this?
> According to system memory, kernel have just one mempool for I/O which
> is one shared by several block driver.
> 
> And we make new API block driver can use.
> Of course, as usual It can use dynamic memoy. Only it can use mempool if
> system don't have much dynamic memory.
> 
> In this case, we can control read/write path. read I/O can't help
> memory reclaiming.
> So I think read I/O don't use mempool, I am not sure. :)

Sure some generic blocklevel infrastructure might work, _but_ you cannot
take away the responsibility of determining the amount of memory needed,
nor does any of this have any merit if you do not limit yourself to that
amount.

Current PF_MEMALLOC usage in the VM is utterly broken in that we can
have a basically unlimited amount of tasks hit direct reclaim and all of
them will then consume PF_MEMALLOC, which mean we can easily run out of
memory.

( unless I missed the direct reclaim throttle patches going in, which
isn't at all impossible )




^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 2/7] mmc: Don't use PF_MEMALLOC
@ 2009-11-18  9:56               ` Peter Zijlstra
  0 siblings, 0 replies; 108+ messages in thread
From: Peter Zijlstra @ 2009-11-18  9:56 UTC (permalink / raw)
  To: Minchan Kim
  Cc: KOSAKI Motohiro, Alan Cox, LKML, linux-mm, Andrew Morton, linux-mmc

On Wed, 2009-11-18 at 09:01 +0900, Minchan Kim wrote:
> Hi, Peter.
> 
> First of all, Thanks for the commenting.
> 
> On Wed, Nov 18, 2009 at 5:47 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> > On Tue, 2009-11-17 at 21:51 +0900, Minchan Kim wrote:
> >> I think it's because mempool reserves memory.
> >> (# of I/O issue\0 is hard to be expected.
> >> How do we determine mempool size of each block driver?
> >> For example,  maybe, server use few I/O for nand.
> >> but embedded system uses a lot of I/O.
> >
> > No, you scale the mempool to the minimum amount required to make
> > progress -- this includes limiting the 'concurrency' when handing out
> > mempool objects.
> >
> > If you run into such tight corners often enough to notice it, there's
> > something else wrong.
> >
> > I fully agree with ripping out PF_MEMALLOC from pretty much everything,
> > including the VM, getting rid of the various abuse outside of the VM
> > seems like a very good start.
> >
> 
> I am not against removing PF_MEMALLOC.
> Totally, I agree to prevent abusing of PF_MEMALLOC.
> 
> What I have a concern is per-block mempool.
> Although it's minimum amount of mempool, it can be increased
> by adding new block driver. I am not sure how many we will have block driver.
> 
> And, person who develop new driver always have to use mempool and consider
> what is minimum of mempool.
> I think this is a problem of mempool, now.
> 
> How about this?
> According to system memory, kernel have just one mempool for I/O which
> is one shared by several block driver.
> 
> And we make new API block driver can use.
> Of course, as usual It can use dynamic memoy. Only it can use mempool if
> system don't have much dynamic memory.
> 
> In this case, we can control read/write path. read I/O can't help
> memory reclaiming.
> So I think read I/O don't use mempool, I am not sure. :)

Sure some generic blocklevel infrastructure might work, _but_ you cannot
take away the responsibility of determining the amount of memory needed,
nor does any of this have any merit if you do not limit yourself to that
amount.

Current PF_MEMALLOC usage in the VM is utterly broken in that we can
have a basically unlimited amount of tasks hit direct reclaim and all of
them will then consume PF_MEMALLOC, which mean we can easily run out of
memory.

( unless I missed the direct reclaim throttle patches going in, which
isn't at all impossible )



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 2/7] mmc: Don't use PF_MEMALLOC
  2009-11-18  9:56               ` Peter Zijlstra
@ 2009-11-18 10:31                 ` Minchan Kim
  -1 siblings, 0 replies; 108+ messages in thread
From: Minchan Kim @ 2009-11-18 10:31 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: KOSAKI Motohiro, Alan Cox, LKML, linux-mm, Andrew Morton, linux-mmc

On Wed, Nov 18, 2009 at 6:56 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Wed, 2009-11-18 at 09:01 +0900, Minchan Kim wrote:
>> Hi, Peter.
>>
>> First of all, Thanks for the commenting.
>>
>> On Wed, Nov 18, 2009 at 5:47 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>> > On Tue, 2009-11-17 at 21:51 +0900, Minchan Kim wrote:
>> >> I think it's because mempool reserves memory.
>> >> (# of I/O issue\0 is hard to be expected.
>> >> How do we determine mempool size of each block driver?
>> >> For example,  maybe, server use few I/O for nand.
>> >> but embedded system uses a lot of I/O.
>> >
>> > No, you scale the mempool to the minimum amount required to make
>> > progress -- this includes limiting the 'concurrency' when handing out
>> > mempool objects.
>> >
>> > If you run into such tight corners often enough to notice it, there's
>> > something else wrong.
>> >
>> > I fully agree with ripping out PF_MEMALLOC from pretty much everything,
>> > including the VM, getting rid of the various abuse outside of the VM
>> > seems like a very good start.
>> >
>>
>> I am not against removing PF_MEMALLOC.
>> Totally, I agree to prevent abusing of PF_MEMALLOC.
>>
>> What I have a concern is per-block mempool.
>> Although it's minimum amount of mempool, it can be increased
>> by adding new block driver. I am not sure how many we will have block driver.
>>
>> And, person who develop new driver always have to use mempool and consider
>> what is minimum of mempool.
>> I think this is a problem of mempool, now.
>>
>> How about this?
>> According to system memory, kernel have just one mempool for I/O which
>> is one shared by several block driver.
>>
>> And we make new API block driver can use.
>> Of course, as usual It can use dynamic memoy. Only it can use mempool if
>> system don't have much dynamic memory.
>>
>> In this case, we can control read/write path. read I/O can't help
>> memory reclaiming.
>> So I think read I/O don't use mempool, I am not sure. :)
>
> Sure some generic blocklevel infrastructure might work, _but_ you cannot
> take away the responsibility of determining the amount of memory needed,
> nor does any of this have any merit if you do not limit yourself to that
> amount.

Yes. Some one have to take a responsibility.

The intention was we could take away the responsibility from block driver.
Instead of driver, VM would take the responsibility.

You mean althgouth VM could take the responsiblity, it is hard to
expect amout of pages
needed by block drivers?

Yes, I agree.

>
> Current PF_MEMALLOC usage in the VM is utterly broken in that we can
> have a basically unlimited amount of tasks hit direct reclaim and all of
> them will then consume PF_MEMALLOC, which mean we can easily run out of
> memory.
>
> ( unless I missed the direct reclaim throttle patches going in, which
> isn't at all impossible )

I think we can prevent it at least.  Kosaki already submitted the patches. :)
(too_many_isolated functions).


I am looking forward to kosaki's next version.

Thanks for careful comment, Peter.
Thanks for submitting good issue, Kosaki. :)

>
>
>



-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 2/7] mmc: Don't use PF_MEMALLOC
@ 2009-11-18 10:31                 ` Minchan Kim
  0 siblings, 0 replies; 108+ messages in thread
From: Minchan Kim @ 2009-11-18 10:31 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: KOSAKI Motohiro, Alan Cox, LKML, linux-mm, Andrew Morton, linux-mmc

On Wed, Nov 18, 2009 at 6:56 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Wed, 2009-11-18 at 09:01 +0900, Minchan Kim wrote:
>> Hi, Peter.
>>
>> First of all, Thanks for the commenting.
>>
>> On Wed, Nov 18, 2009 at 5:47 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>> > On Tue, 2009-11-17 at 21:51 +0900, Minchan Kim wrote:
>> >> I think it's because mempool reserves memory.
>> >> (# of I/O issue\0 is hard to be expected.
>> >> How do we determine mempool size of each block driver?
>> >> For example,  maybe, server use few I/O for nand.
>> >> but embedded system uses a lot of I/O.
>> >
>> > No, you scale the mempool to the minimum amount required to make
>> > progress -- this includes limiting the 'concurrency' when handing out
>> > mempool objects.
>> >
>> > If you run into such tight corners often enough to notice it, there's
>> > something else wrong.
>> >
>> > I fully agree with ripping out PF_MEMALLOC from pretty much everything,
>> > including the VM, getting rid of the various abuse outside of the VM
>> > seems like a very good start.
>> >
>>
>> I am not against removing PF_MEMALLOC.
>> Totally, I agree to prevent abusing of PF_MEMALLOC.
>>
>> What I have a concern is per-block mempool.
>> Although it's minimum amount of mempool, it can be increased
>> by adding new block driver. I am not sure how many we will have block driver.
>>
>> And, person who develop new driver always have to use mempool and consider
>> what is minimum of mempool.
>> I think this is a problem of mempool, now.
>>
>> How about this?
>> According to system memory, kernel have just one mempool for I/O which
>> is one shared by several block driver.
>>
>> And we make new API block driver can use.
>> Of course, as usual It can use dynamic memoy. Only it can use mempool if
>> system don't have much dynamic memory.
>>
>> In this case, we can control read/write path. read I/O can't help
>> memory reclaiming.
>> So I think read I/O don't use mempool, I am not sure. :)
>
> Sure some generic blocklevel infrastructure might work, _but_ you cannot
> take away the responsibility of determining the amount of memory needed,
> nor does any of this have any merit if you do not limit yourself to that
> amount.

Yes. Some one have to take a responsibility.

The intention was we could take away the responsibility from block driver.
Instead of driver, VM would take the responsibility.

You mean althgouth VM could take the responsiblity, it is hard to
expect amout of pages
needed by block drivers?

Yes, I agree.

>
> Current PF_MEMALLOC usage in the VM is utterly broken in that we can
> have a basically unlimited amount of tasks hit direct reclaim and all of
> them will then consume PF_MEMALLOC, which mean we can easily run out of
> memory.
>
> ( unless I missed the direct reclaim throttle patches going in, which
> isn't at all impossible )

I think we can prevent it at least.  Kosaki already submitted the patches. :)
(too_many_isolated functions).


I am looking forward to kosaki's next version.

Thanks for careful comment, Peter.
Thanks for submitting good issue, Kosaki. :)

>
>
>



-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 2/7] mmc: Don't use PF_MEMALLOC
  2009-11-18 10:31                 ` Minchan Kim
@ 2009-11-18 10:54                   ` Peter Zijlstra
  -1 siblings, 0 replies; 108+ messages in thread
From: Peter Zijlstra @ 2009-11-18 10:54 UTC (permalink / raw)
  To: Minchan Kim
  Cc: KOSAKI Motohiro, Alan Cox, LKML, linux-mm, Andrew Morton, linux-mmc

On Wed, 2009-11-18 at 19:31 +0900, Minchan Kim wrote:
> >
> > Sure some generic blocklevel infrastructure might work, _but_ you cannot
> > take away the responsibility of determining the amount of memory needed,
> > nor does any of this have any merit if you do not limit yourself to that
> > amount.
> 
> Yes. Some one have to take a responsibility.
> 
> The intention was we could take away the responsibility from block driver.
> Instead of driver, VM would take the responsibility.
> 
> You mean althgouth VM could take the responsiblity, it is hard to
> expect amout of pages needed by block drivers? 

Correct, its near impossible for the VM to accurately guess the amount
of memory needed for a driver, or limit the usage of the driver.

The driver could be very simple in that it'll just start a DMA on the
page and get an interrupt when done, not consuming much (if any) memory
beyond the generic BIO structure, but it could also be some iSCSI
monstrosity which involves the full network stack and userspace.

That is why I generally prefer the user of PF_MEMALLOC to take
responsibility, because it knows its own consumption and can limit its
own consumption.

Now, I don't think (but I could be wring here) that you need to bother
with PF_MEMALLOC unless you're swapping. File based pages should always
be able to free some memory due to the dirty-limit, whcih basically
guarantees that there are some clean file pages for every dirty file
page.

My swap-over-nfs series used to have a block-layer hook to expose the
swap-over-block behaviour:

http://programming.kicks-ass.net/kernel-patches/vm_deadlock/v12.99/blk_queue_swapdev.patch

That gives block devices the power to refuse being swapped over, which
could be an alternative to using PF_MEMALLOC.


^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 2/7] mmc: Don't use PF_MEMALLOC
@ 2009-11-18 10:54                   ` Peter Zijlstra
  0 siblings, 0 replies; 108+ messages in thread
From: Peter Zijlstra @ 2009-11-18 10:54 UTC (permalink / raw)
  To: Minchan Kim
  Cc: KOSAKI Motohiro, Alan Cox, LKML, linux-mm, Andrew Morton, linux-mmc

On Wed, 2009-11-18 at 19:31 +0900, Minchan Kim wrote:
> >
> > Sure some generic blocklevel infrastructure might work, _but_ you cannot
> > take away the responsibility of determining the amount of memory needed,
> > nor does any of this have any merit if you do not limit yourself to that
> > amount.
> 
> Yes. Some one have to take a responsibility.
> 
> The intention was we could take away the responsibility from block driver.
> Instead of driver, VM would take the responsibility.
> 
> You mean althgouth VM could take the responsiblity, it is hard to
> expect amout of pages needed by block drivers? 

Correct, its near impossible for the VM to accurately guess the amount
of memory needed for a driver, or limit the usage of the driver.

The driver could be very simple in that it'll just start a DMA on the
page and get an interrupt when done, not consuming much (if any) memory
beyond the generic BIO structure, but it could also be some iSCSI
monstrosity which involves the full network stack and userspace.

That is why I generally prefer the user of PF_MEMALLOC to take
responsibility, because it knows its own consumption and can limit its
own consumption.

Now, I don't think (but I could be wring here) that you need to bother
with PF_MEMALLOC unless you're swapping. File based pages should always
be able to free some memory due to the dirty-limit, whcih basically
guarantees that there are some clean file pages for every dirty file
page.

My swap-over-nfs series used to have a block-layer hook to expose the
swap-over-block behaviour:

http://programming.kicks-ass.net/kernel-patches/vm_deadlock/v12.99/blk_queue_swapdev.patch

That gives block devices the power to refuse being swapped over, which
could be an alternative to using PF_MEMALLOC.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 2/7] mmc: Don't use PF_MEMALLOC
  2009-11-18 10:54                   ` Peter Zijlstra
@ 2009-11-18 11:15                     ` Minchan Kim
  -1 siblings, 0 replies; 108+ messages in thread
From: Minchan Kim @ 2009-11-18 11:15 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: KOSAKI Motohiro, Alan Cox, LKML, linux-mm, Andrew Morton, linux-mmc

On Wed, Nov 18, 2009 at 7:54 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Wed, 2009-11-18 at 19:31 +0900, Minchan Kim wrote:
>> >
>> > Sure some generic blocklevel infrastructure might work, _but_ you cannot
>> > take away the responsibility of determining the amount of memory needed,
>> > nor does any of this have any merit if you do not limit yourself to that
>> > amount.
>>
>> Yes. Some one have to take a responsibility.
>>
>> The intention was we could take away the responsibility from block driver.
>> Instead of driver, VM would take the responsibility.
>>
>> You mean althgouth VM could take the responsiblity, it is hard to
>> expect amout of pages needed by block drivers?
>
> Correct, its near impossible for the VM to accurately guess the amount
> of memory needed for a driver, or limit the usage of the driver.
>
> The driver could be very simple in that it'll just start a DMA on the
> page and get an interrupt when done, not consuming much (if any) memory
> beyond the generic BIO structure, but it could also be some iSCSI
> monstrosity which involves the full network stack and userspace.

Wow, Thanks for good example.
Until now, I don't know iSCSI is such memory hog driver.

> That is why I generally prefer the user of PF_MEMALLOC to take
> responsibility, because it knows its own consumption and can limit its
> own consumption.

Okay. I understand your point by good explanation.

> Now, I don't think (but I could be wring here) that you need to bother
> with PF_MEMALLOC unless you're swapping. File based pages should always
> be able to free some memory due to the dirty-limit, whcih basically
> guarantees that there are some clean file pages for every dirty file
> page.
>
> My swap-over-nfs series used to have a block-layer hook to expose the
> swap-over-block behaviour:
>
> http://programming.kicks-ass.net/kernel-patches/vm_deadlock/v12.99/blk_queue_swapdev.patch
>
> That gives block devices the power to refuse being swapped over, which
> could be an alternative to using PF_MEMALLOC.
>

Thanks for noticing me.
I will look at your patches.
Thanks again, Peter.





-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 2/7] mmc: Don't use PF_MEMALLOC
@ 2009-11-18 11:15                     ` Minchan Kim
  0 siblings, 0 replies; 108+ messages in thread
From: Minchan Kim @ 2009-11-18 11:15 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: KOSAKI Motohiro, Alan Cox, LKML, linux-mm, Andrew Morton, linux-mmc

On Wed, Nov 18, 2009 at 7:54 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Wed, 2009-11-18 at 19:31 +0900, Minchan Kim wrote:
>> >
>> > Sure some generic blocklevel infrastructure might work, _but_ you cannot
>> > take away the responsibility of determining the amount of memory needed,
>> > nor does any of this have any merit if you do not limit yourself to that
>> > amount.
>>
>> Yes. Some one have to take a responsibility.
>>
>> The intention was we could take away the responsibility from block driver.
>> Instead of driver, VM would take the responsibility.
>>
>> You mean althgouth VM could take the responsiblity, it is hard to
>> expect amout of pages needed by block drivers?
>
> Correct, its near impossible for the VM to accurately guess the amount
> of memory needed for a driver, or limit the usage of the driver.
>
> The driver could be very simple in that it'll just start a DMA on the
> page and get an interrupt when done, not consuming much (if any) memory
> beyond the generic BIO structure, but it could also be some iSCSI
> monstrosity which involves the full network stack and userspace.

Wow, Thanks for good example.
Until now, I don't know iSCSI is such memory hog driver.

> That is why I generally prefer the user of PF_MEMALLOC to take
> responsibility, because it knows its own consumption and can limit its
> own consumption.

Okay. I understand your point by good explanation.

> Now, I don't think (but I could be wring here) that you need to bother
> with PF_MEMALLOC unless you're swapping. File based pages should always
> be able to free some memory due to the dirty-limit, whcih basically
> guarantees that there are some clean file pages for every dirty file
> page.
>
> My swap-over-nfs series used to have a block-layer hook to expose the
> swap-over-block behaviour:
>
> http://programming.kicks-ass.net/kernel-patches/vm_deadlock/v12.99/blk_queue_swapdev.patch
>
> That gives block devices the power to refuse being swapped over, which
> could be an alternative to using PF_MEMALLOC.
>

Thanks for noticing me.
I will look at your patches.
Thanks again, Peter.





-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 7/7] xfs: Don't use PF_MEMALLOC
  2009-11-18  8:56       ` KOSAKI Motohiro
  (?)
@ 2009-11-18 22:16         ` Dave Chinner
  -1 siblings, 0 replies; 108+ messages in thread
From: Dave Chinner @ 2009-11-18 22:16 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: LKML, linux-mm, Andrew Morton, Christoph Hellwig, linux-fsdevel,
	xfs-masters, xfs

On Wed, Nov 18, 2009 at 05:56:46PM +0900, KOSAKI Motohiro wrote:
> > On Tue, Nov 17, 2009 at 04:23:43PM +0900, KOSAKI Motohiro wrote:
> > > 
> > > Non MM subsystem must not use PF_MEMALLOC. Memory reclaim need few
> > > memory, anyone must not prevent it. Otherwise the system cause
> > > mysterious hang-up and/or OOM Killer invokation.
> > 
> > The xfsbufd is a woken run by a registered memory shaker. i.e. it
> > runs when the system needs to reclaim memory. It forceѕ the
> > delayed write metadata buffers (of which there can be a lot) to disk
> > so that they can be reclaimed on IO completion. This IO submission
> > may require ѕome memory to be allocated to be able to free that
> > memory.
> > 
> > Hence, AFAICT the use of PF_MEMALLOC is valid here.
> 
> Thanks a lot. 
> I have one additional question, may I ask you?
> 
> How can we calculate maximum memory usage in xfsbufd?

It doesn't get calculated at the moment. It is very difficult to
calculate a usable size metric for it because there are multiple
caches (up to 3 per filesystem), and dentry/inode reclaim causes the
size of the cache to grow.  Hence the size of the cache is not
really something that can be considered a stable or predictable
input into a "reclaim now" calculation. As such we simply cause
xfsbufd run simultaneously with the shrinkers that cause it to
grow....

> I'm afraid that VM and XFS works properly but adding two makes memory exhaust.

I don't understand what you are trying to say here.

> And, I conclude XFS doesn't need sharing reservation memory with VM,
> it only need non failed allocation. right? IOW I'm prefer perter's
> suggestion.

Right. However, it is worth keeping in mind that this is a
performance critical path for inode reclaim.  Hence any throttling
of allocation will slow down the rate at which memory is freed by
the system....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 7/7] xfs: Don't use PF_MEMALLOC
@ 2009-11-18 22:16         ` Dave Chinner
  0 siblings, 0 replies; 108+ messages in thread
From: Dave Chinner @ 2009-11-18 22:16 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: xfs-masters, LKML, xfs, Christoph Hellwig, linux-mm,
	linux-fsdevel, Andrew Morton

On Wed, Nov 18, 2009 at 05:56:46PM +0900, KOSAKI Motohiro wrote:
> > On Tue, Nov 17, 2009 at 04:23:43PM +0900, KOSAKI Motohiro wrote:
> > > 
> > > Non MM subsystem must not use PF_MEMALLOC. Memory reclaim need few
> > > memory, anyone must not prevent it. Otherwise the system cause
> > > mysterious hang-up and/or OOM Killer invokation.
> > 
> > The xfsbufd is a woken run by a registered memory shaker. i.e. it
> > runs when the system needs to reclaim memory. It forceѕ the
> > delayed write metadata buffers (of which there can be a lot) to disk
> > so that they can be reclaimed on IO completion. This IO submission
> > may require ѕome memory to be allocated to be able to free that
> > memory.
> > 
> > Hence, AFAICT the use of PF_MEMALLOC is valid here.
> 
> Thanks a lot. 
> I have one additional question, may I ask you?
> 
> How can we calculate maximum memory usage in xfsbufd?

It doesn't get calculated at the moment. It is very difficult to
calculate a usable size metric for it because there are multiple
caches (up to 3 per filesystem), and dentry/inode reclaim causes the
size of the cache to grow.  Hence the size of the cache is not
really something that can be considered a stable or predictable
input into a "reclaim now" calculation. As such we simply cause
xfsbufd run simultaneously with the shrinkers that cause it to
grow....

> I'm afraid that VM and XFS works properly but adding two makes memory exhaust.

I don't understand what you are trying to say here.

> And, I conclude XFS doesn't need sharing reservation memory with VM,
> it only need non failed allocation. right? IOW I'm prefer perter's
> suggestion.

Right. However, it is worth keeping in mind that this is a
performance critical path for inode reclaim.  Hence any throttling
of allocation will slow down the rate at which memory is freed by
the system....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 7/7] xfs: Don't use PF_MEMALLOC
@ 2009-11-18 22:16         ` Dave Chinner
  0 siblings, 0 replies; 108+ messages in thread
From: Dave Chinner @ 2009-11-18 22:16 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: LKML, linux-mm, Andrew Morton, Christoph Hellwig, linux-fsdevel,
	xfs-masters, xfs

On Wed, Nov 18, 2009 at 05:56:46PM +0900, KOSAKI Motohiro wrote:
> > On Tue, Nov 17, 2009 at 04:23:43PM +0900, KOSAKI Motohiro wrote:
> > > 
> > > Non MM subsystem must not use PF_MEMALLOC. Memory reclaim need few
> > > memory, anyone must not prevent it. Otherwise the system cause
> > > mysterious hang-up and/or OOM Killer invokation.
> > 
> > The xfsbufd is a woken run by a registered memory shaker. i.e. it
> > runs when the system needs to reclaim memory. It forceN? the
> > delayed write metadata buffers (of which there can be a lot) to disk
> > so that they can be reclaimed on IO completion. This IO submission
> > may require N?ome memory to be allocated to be able to free that
> > memory.
> > 
> > Hence, AFAICT the use of PF_MEMALLOC is valid here.
> 
> Thanks a lot. 
> I have one additional question, may I ask you?
> 
> How can we calculate maximum memory usage in xfsbufd?

It doesn't get calculated at the moment. It is very difficult to
calculate a usable size metric for it because there are multiple
caches (up to 3 per filesystem), and dentry/inode reclaim causes the
size of the cache to grow.  Hence the size of the cache is not
really something that can be considered a stable or predictable
input into a "reclaim now" calculation. As such we simply cause
xfsbufd run simultaneously with the shrinkers that cause it to
grow....

> I'm afraid that VM and XFS works properly but adding two makes memory exhaust.

I don't understand what you are trying to say here.

> And, I conclude XFS doesn't need sharing reservation memory with VM,
> it only need non failed allocation. right? IOW I'm prefer perter's
> suggestion.

Right. However, it is worth keeping in mind that this is a
performance critical path for inode reclaim.  Hence any throttling
of allocation will slow down the rate at which memory is freed by
the system....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 4/7] nandsim: Don't use PF_MEMALLOC
  2009-11-17  7:19   ` KOSAKI Motohiro
  (?)
@ 2009-11-23 15:00     ` Artem Bityutskiy
  -1 siblings, 0 replies; 108+ messages in thread
From: Artem Bityutskiy @ 2009-11-23 15:00 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: LKML, linux-mm, Andrew Morton, David Woodhouse, linux-mtd, Adrian Hunter

On Tue, 2009-11-17 at 16:19 +0900, KOSAKI Motohiro wrote:
> Non MM subsystem must not use PF_MEMALLOC. Memory reclaim need few
> memory, anyone must not prevent it. Otherwise the system cause
> mysterious hang-up and/or OOM Killer invokation.
> 
> Cc: David Woodhouse <David.Woodhouse@intel.com>
> Cc: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
> Cc: linux-mtd@lists.infradead.org
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> ---
>  drivers/mtd/nand/nandsim.c |   22 ++--------------------
>  1 files changed, 2 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/mtd/nand/nandsim.c b/drivers/mtd/nand/nandsim.c
> index cd0711b..97a8bbb 100644
> --- a/drivers/mtd/nand/nandsim.c
> +++ b/drivers/mtd/nand/nandsim.c
> @@ -1322,34 +1322,18 @@ static int get_pages(struct nandsim *ns, struct file *file, size_t count, loff_t
>  	return 0;
>  }
>  
> -static int set_memalloc(void)
> -{
> -	if (current->flags & PF_MEMALLOC)
> -		return 0;
> -	current->flags |= PF_MEMALLOC;
> -	return 1;
> -}
> -
> -static void clear_memalloc(int memalloc)
> -{
> -	if (memalloc)
> -		current->flags &= ~PF_MEMALLOC;
> -}
> -
>  static ssize_t read_file(struct nandsim *ns, struct file *file, void *buf, size_t count, loff_t *pos)
>  {
>  	mm_segment_t old_fs;
>  	ssize_t tx;
> -	int err, memalloc;
> +	int err;
>  
>  	err = get_pages(ns, file, count, *pos);
>  	if (err)
>  		return err;
>  	old_fs = get_fs();
>  	set_fs(get_ds());
> -	memalloc = set_memalloc();
>  	tx = vfs_read(file, (char __user *)buf, count, pos);
> -	clear_memalloc(memalloc);
>  	set_fs(old_fs);
>  	put_pages(ns);
>  	return tx;
> @@ -1359,16 +1343,14 @@ static ssize_t write_file(struct nandsim *ns, struct file *file, void *buf, size
>  {
>  	mm_segment_t old_fs;
>  	ssize_t tx;
> -	int err, memalloc;
> +	int err;
>  
>  	err = get_pages(ns, file, count, *pos);
>  	if (err)
>  		return err;
>  	old_fs = get_fs();
>  	set_fs(get_ds());
> -	memalloc = set_memalloc();
>  	tx = vfs_write(file, (char __user *)buf, count, pos);
> -	clear_memalloc(memalloc);
>  	set_fs(old_fs);
>  	put_pages(ns);
>  	return tx;

I vaguely remember Adrian (CCed) did this on purpose. This is for the
case when nandsim emulates NAND flash on top of a file. So there are 2
file-systems involved: one sits on top of nandsim (e.g. UBIFS) and the
other owns the file which nandsim uses (e.g., ext3).

And I really cannot remember off the top of my head why he needed
PF_MEMALLOC, but I think Adrian wanted to prevent the direct reclaim
path to re-enter, say UBIFS, and cause deadlock. But I'd thing that all
the allocations in vfs_read()/vfs_write() should be GFP_NOFS, so that
should not be a probelm?

-- 
Best Regards,
Artem Bityutskiy (Артём Битюцкий)


^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 4/7] nandsim: Don't use PF_MEMALLOC
@ 2009-11-23 15:00     ` Artem Bityutskiy
  0 siblings, 0 replies; 108+ messages in thread
From: Artem Bityutskiy @ 2009-11-23 15:00 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: LKML, linux-mm, Andrew Morton, David Woodhouse, linux-mtd, Adrian Hunter

On Tue, 2009-11-17 at 16:19 +0900, KOSAKI Motohiro wrote:
> Non MM subsystem must not use PF_MEMALLOC. Memory reclaim need few
> memory, anyone must not prevent it. Otherwise the system cause
> mysterious hang-up and/or OOM Killer invokation.
> 
> Cc: David Woodhouse <David.Woodhouse@intel.com>
> Cc: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
> Cc: linux-mtd@lists.infradead.org
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> ---
>  drivers/mtd/nand/nandsim.c |   22 ++--------------------
>  1 files changed, 2 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/mtd/nand/nandsim.c b/drivers/mtd/nand/nandsim.c
> index cd0711b..97a8bbb 100644
> --- a/drivers/mtd/nand/nandsim.c
> +++ b/drivers/mtd/nand/nandsim.c
> @@ -1322,34 +1322,18 @@ static int get_pages(struct nandsim *ns, struct file *file, size_t count, loff_t
>  	return 0;
>  }
>  
> -static int set_memalloc(void)
> -{
> -	if (current->flags & PF_MEMALLOC)
> -		return 0;
> -	current->flags |= PF_MEMALLOC;
> -	return 1;
> -}
> -
> -static void clear_memalloc(int memalloc)
> -{
> -	if (memalloc)
> -		current->flags &= ~PF_MEMALLOC;
> -}
> -
>  static ssize_t read_file(struct nandsim *ns, struct file *file, void *buf, size_t count, loff_t *pos)
>  {
>  	mm_segment_t old_fs;
>  	ssize_t tx;
> -	int err, memalloc;
> +	int err;
>  
>  	err = get_pages(ns, file, count, *pos);
>  	if (err)
>  		return err;
>  	old_fs = get_fs();
>  	set_fs(get_ds());
> -	memalloc = set_memalloc();
>  	tx = vfs_read(file, (char __user *)buf, count, pos);
> -	clear_memalloc(memalloc);
>  	set_fs(old_fs);
>  	put_pages(ns);
>  	return tx;
> @@ -1359,16 +1343,14 @@ static ssize_t write_file(struct nandsim *ns, struct file *file, void *buf, size
>  {
>  	mm_segment_t old_fs;
>  	ssize_t tx;
> -	int err, memalloc;
> +	int err;
>  
>  	err = get_pages(ns, file, count, *pos);
>  	if (err)
>  		return err;
>  	old_fs = get_fs();
>  	set_fs(get_ds());
> -	memalloc = set_memalloc();
>  	tx = vfs_write(file, (char __user *)buf, count, pos);
> -	clear_memalloc(memalloc);
>  	set_fs(old_fs);
>  	put_pages(ns);
>  	return tx;

I vaguely remember Adrian (CCed) did this on purpose. This is for the
case when nandsim emulates NAND flash on top of a file. So there are 2
file-systems involved: one sits on top of nandsim (e.g. UBIFS) and the
other owns the file which nandsim uses (e.g., ext3).

And I really cannot remember off the top of my head why he needed
PF_MEMALLOC, but I think Adrian wanted to prevent the direct reclaim
path to re-enter, say UBIFS, and cause deadlock. But I'd thing that all
the allocations in vfs_read()/vfs_write() should be GFP_NOFS, so that
should not be a probelm?

-- 
Best Regards,
Artem Bityutskiy (D?N?N?N?D 1/4  D?D,N?N?N?DoD,D1)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 4/7] nandsim: Don't use PF_MEMALLOC
@ 2009-11-23 15:00     ` Artem Bityutskiy
  0 siblings, 0 replies; 108+ messages in thread
From: Artem Bityutskiy @ 2009-11-23 15:00 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: David Woodhouse, LKML, linux-mm, linux-mtd, Andrew Morton, Adrian Hunter

On Tue, 2009-11-17 at 16:19 +0900, KOSAKI Motohiro wrote:
> Non MM subsystem must not use PF_MEMALLOC. Memory reclaim need few
> memory, anyone must not prevent it. Otherwise the system cause
> mysterious hang-up and/or OOM Killer invokation.
> 
> Cc: David Woodhouse <David.Woodhouse@intel.com>
> Cc: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
> Cc: linux-mtd@lists.infradead.org
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> ---
>  drivers/mtd/nand/nandsim.c |   22 ++--------------------
>  1 files changed, 2 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/mtd/nand/nandsim.c b/drivers/mtd/nand/nandsim.c
> index cd0711b..97a8bbb 100644
> --- a/drivers/mtd/nand/nandsim.c
> +++ b/drivers/mtd/nand/nandsim.c
> @@ -1322,34 +1322,18 @@ static int get_pages(struct nandsim *ns, struct file *file, size_t count, loff_t
>  	return 0;
>  }
>  
> -static int set_memalloc(void)
> -{
> -	if (current->flags & PF_MEMALLOC)
> -		return 0;
> -	current->flags |= PF_MEMALLOC;
> -	return 1;
> -}
> -
> -static void clear_memalloc(int memalloc)
> -{
> -	if (memalloc)
> -		current->flags &= ~PF_MEMALLOC;
> -}
> -
>  static ssize_t read_file(struct nandsim *ns, struct file *file, void *buf, size_t count, loff_t *pos)
>  {
>  	mm_segment_t old_fs;
>  	ssize_t tx;
> -	int err, memalloc;
> +	int err;
>  
>  	err = get_pages(ns, file, count, *pos);
>  	if (err)
>  		return err;
>  	old_fs = get_fs();
>  	set_fs(get_ds());
> -	memalloc = set_memalloc();
>  	tx = vfs_read(file, (char __user *)buf, count, pos);
> -	clear_memalloc(memalloc);
>  	set_fs(old_fs);
>  	put_pages(ns);
>  	return tx;
> @@ -1359,16 +1343,14 @@ static ssize_t write_file(struct nandsim *ns, struct file *file, void *buf, size
>  {
>  	mm_segment_t old_fs;
>  	ssize_t tx;
> -	int err, memalloc;
> +	int err;
>  
>  	err = get_pages(ns, file, count, *pos);
>  	if (err)
>  		return err;
>  	old_fs = get_fs();
>  	set_fs(get_ds());
> -	memalloc = set_memalloc();
>  	tx = vfs_write(file, (char __user *)buf, count, pos);
> -	clear_memalloc(memalloc);
>  	set_fs(old_fs);
>  	put_pages(ns);
>  	return tx;

I vaguely remember Adrian (CCed) did this on purpose. This is for the
case when nandsim emulates NAND flash on top of a file. So there are 2
file-systems involved: one sits on top of nandsim (e.g. UBIFS) and the
other owns the file which nandsim uses (e.g., ext3).

And I really cannot remember off the top of my head why he needed
PF_MEMALLOC, but I think Adrian wanted to prevent the direct reclaim
path to re-enter, say UBIFS, and cause deadlock. But I'd thing that all
the allocations in vfs_read()/vfs_write() should be GFP_NOFS, so that
should not be a probelm?

-- 
Best Regards,
Artem Bityutskiy (Артём Битюцкий)

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 4/7] nandsim: Don't use PF_MEMALLOC
  2009-11-23 15:00     ` Artem Bityutskiy
  (?)
@ 2009-11-23 20:01       ` Adrian Hunter
  -1 siblings, 0 replies; 108+ messages in thread
From: Adrian Hunter @ 2009-11-23 20:01 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Bityutskiy Artem (Nokia-D/Helsinki),
	LKML, linux-mm, Andrew Morton, David Woodhouse, linux-mtd

Bityutskiy Artem (Nokia-D/Helsinki) wrote:
> On Tue, 2009-11-17 at 16:19 +0900, KOSAKI Motohiro wrote:
>> Non MM subsystem must not use PF_MEMALLOC. Memory reclaim need few
>> memory, anyone must not prevent it. Otherwise the system cause
>> mysterious hang-up and/or OOM Killer invokation.
>>
>> Cc: David Woodhouse <David.Woodhouse@intel.com>
>> Cc: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
>> Cc: linux-mtd@lists.infradead.org
>> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
>> ---
>>  drivers/mtd/nand/nandsim.c |   22 ++--------------------
>>  1 files changed, 2 insertions(+), 20 deletions(-)
>>
>> diff --git a/drivers/mtd/nand/nandsim.c b/drivers/mtd/nand/nandsim.c
>> index cd0711b..97a8bbb 100644
>> --- a/drivers/mtd/nand/nandsim.c
>> +++ b/drivers/mtd/nand/nandsim.c
>> @@ -1322,34 +1322,18 @@ static int get_pages(struct nandsim *ns, struct file *file, size_t count, loff_t
>>  	return 0;
>>  }
>>  
>> -static int set_memalloc(void)
>> -{
>> -	if (current->flags & PF_MEMALLOC)
>> -		return 0;
>> -	current->flags |= PF_MEMALLOC;
>> -	return 1;
>> -}
>> -
>> -static void clear_memalloc(int memalloc)
>> -{
>> -	if (memalloc)
>> -		current->flags &= ~PF_MEMALLOC;
>> -}
>> -
>>  static ssize_t read_file(struct nandsim *ns, struct file *file, void *buf, size_t count, loff_t *pos)
>>  {
>>  	mm_segment_t old_fs;
>>  	ssize_t tx;
>> -	int err, memalloc;
>> +	int err;
>>  
>>  	err = get_pages(ns, file, count, *pos);
>>  	if (err)
>>  		return err;
>>  	old_fs = get_fs();
>>  	set_fs(get_ds());
>> -	memalloc = set_memalloc();
>>  	tx = vfs_read(file, (char __user *)buf, count, pos);
>> -	clear_memalloc(memalloc);
>>  	set_fs(old_fs);
>>  	put_pages(ns);
>>  	return tx;
>> @@ -1359,16 +1343,14 @@ static ssize_t write_file(struct nandsim *ns, struct file *file, void *buf, size
>>  {
>>  	mm_segment_t old_fs;
>>  	ssize_t tx;
>> -	int err, memalloc;
>> +	int err;
>>  
>>  	err = get_pages(ns, file, count, *pos);
>>  	if (err)
>>  		return err;
>>  	old_fs = get_fs();
>>  	set_fs(get_ds());
>> -	memalloc = set_memalloc();
>>  	tx = vfs_write(file, (char __user *)buf, count, pos);
>> -	clear_memalloc(memalloc);
>>  	set_fs(old_fs);
>>  	put_pages(ns);
>>  	return tx;PF_MEMALLOC,
> 
> I vaguely remember Adrian (CCed) did this on purpose. This is for the
> case when nandsim emulates NAND flash on top of a file. So there are 2
> file-systems involved: one sits on top of nandsim (e.g. UBIFS) and the
> other owns the file which nandsim uses (e.g., ext3).
> 
> And I really cannot remember off the top of my head why he needed
> PF_MEMALLOC, but I think Adrian wanted to prevent the direct reclaim
> path to re-enter, say UBIFS, and cause deadlock. But I'd thing that all
> the allocations in vfs_read()/vfs_write() should be GFP_NOFS, so that
> should not be a probelm?
> 

Yes it needs PF_MEMALLOC to prevent deadlock because there can be a
file system on top of nandsim which, in this case, is on top of another
file system.

I do not see how mempools will help here.

Please offer an alternative solution.

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 4/7] nandsim: Don't use PF_MEMALLOC
@ 2009-11-23 20:01       ` Adrian Hunter
  0 siblings, 0 replies; 108+ messages in thread
From: Adrian Hunter @ 2009-11-23 20:01 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Bityutskiy Artem (Nokia-D/Helsinki),
	LKML, linux-mm, Andrew Morton, David Woodhouse, linux-mtd

Bityutskiy Artem (Nokia-D/Helsinki) wrote:
> On Tue, 2009-11-17 at 16:19 +0900, KOSAKI Motohiro wrote:
>> Non MM subsystem must not use PF_MEMALLOC. Memory reclaim need few
>> memory, anyone must not prevent it. Otherwise the system cause
>> mysterious hang-up and/or OOM Killer invokation.
>>
>> Cc: David Woodhouse <David.Woodhouse@intel.com>
>> Cc: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
>> Cc: linux-mtd@lists.infradead.org
>> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
>> ---
>>  drivers/mtd/nand/nandsim.c |   22 ++--------------------
>>  1 files changed, 2 insertions(+), 20 deletions(-)
>>
>> diff --git a/drivers/mtd/nand/nandsim.c b/drivers/mtd/nand/nandsim.c
>> index cd0711b..97a8bbb 100644
>> --- a/drivers/mtd/nand/nandsim.c
>> +++ b/drivers/mtd/nand/nandsim.c
>> @@ -1322,34 +1322,18 @@ static int get_pages(struct nandsim *ns, struct file *file, size_t count, loff_t
>>  	return 0;
>>  }
>>  
>> -static int set_memalloc(void)
>> -{
>> -	if (current->flags & PF_MEMALLOC)
>> -		return 0;
>> -	current->flags |= PF_MEMALLOC;
>> -	return 1;
>> -}
>> -
>> -static void clear_memalloc(int memalloc)
>> -{
>> -	if (memalloc)
>> -		current->flags &= ~PF_MEMALLOC;
>> -}
>> -
>>  static ssize_t read_file(struct nandsim *ns, struct file *file, void *buf, size_t count, loff_t *pos)
>>  {
>>  	mm_segment_t old_fs;
>>  	ssize_t tx;
>> -	int err, memalloc;
>> +	int err;
>>  
>>  	err = get_pages(ns, file, count, *pos);
>>  	if (err)
>>  		return err;
>>  	old_fs = get_fs();
>>  	set_fs(get_ds());
>> -	memalloc = set_memalloc();
>>  	tx = vfs_read(file, (char __user *)buf, count, pos);
>> -	clear_memalloc(memalloc);
>>  	set_fs(old_fs);
>>  	put_pages(ns);
>>  	return tx;
>> @@ -1359,16 +1343,14 @@ static ssize_t write_file(struct nandsim *ns, struct file *file, void *buf, size
>>  {
>>  	mm_segment_t old_fs;
>>  	ssize_t tx;
>> -	int err, memalloc;
>> +	int err;
>>  
>>  	err = get_pages(ns, file, count, *pos);
>>  	if (err)
>>  		return err;
>>  	old_fs = get_fs();
>>  	set_fs(get_ds());
>> -	memalloc = set_memalloc();
>>  	tx = vfs_write(file, (char __user *)buf, count, pos);
>> -	clear_memalloc(memalloc);
>>  	set_fs(old_fs);
>>  	put_pages(ns);
>>  	return tx;PF_MEMALLOC,
> 
> I vaguely remember Adrian (CCed) did this on purpose. This is for the
> case when nandsim emulates NAND flash on top of a file. So there are 2
> file-systems involved: one sits on top of nandsim (e.g. UBIFS) and the
> other owns the file which nandsim uses (e.g., ext3).
> 
> And I really cannot remember off the top of my head why he needed
> PF_MEMALLOC, but I think Adrian wanted to prevent the direct reclaim
> path to re-enter, say UBIFS, and cause deadlock. But I'd thing that all
> the allocations in vfs_read()/vfs_write() should be GFP_NOFS, so that
> should not be a probelm?
> 

Yes it needs PF_MEMALLOC to prevent deadlock because there can be a
file system on top of nandsim which, in this case, is on top of another
file system.

I do not see how mempools will help here.

Please offer an alternative solution.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 4/7] nandsim: Don't use PF_MEMALLOC
@ 2009-11-23 20:01       ` Adrian Hunter
  0 siblings, 0 replies; 108+ messages in thread
From: Adrian Hunter @ 2009-11-23 20:01 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Bityutskiy Artem (Nokia-D/Helsinki),
	David Woodhouse, LKML, linux-mm, linux-mtd, Andrew Morton

Bityutskiy Artem (Nokia-D/Helsinki) wrote:
> On Tue, 2009-11-17 at 16:19 +0900, KOSAKI Motohiro wrote:
>> Non MM subsystem must not use PF_MEMALLOC. Memory reclaim need few
>> memory, anyone must not prevent it. Otherwise the system cause
>> mysterious hang-up and/or OOM Killer invokation.
>>
>> Cc: David Woodhouse <David.Woodhouse@intel.com>
>> Cc: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
>> Cc: linux-mtd@lists.infradead.org
>> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
>> ---
>>  drivers/mtd/nand/nandsim.c |   22 ++--------------------
>>  1 files changed, 2 insertions(+), 20 deletions(-)
>>
>> diff --git a/drivers/mtd/nand/nandsim.c b/drivers/mtd/nand/nandsim.c
>> index cd0711b..97a8bbb 100644
>> --- a/drivers/mtd/nand/nandsim.c
>> +++ b/drivers/mtd/nand/nandsim.c
>> @@ -1322,34 +1322,18 @@ static int get_pages(struct nandsim *ns, struct file *file, size_t count, loff_t
>>  	return 0;
>>  }
>>  
>> -static int set_memalloc(void)
>> -{
>> -	if (current->flags & PF_MEMALLOC)
>> -		return 0;
>> -	current->flags |= PF_MEMALLOC;
>> -	return 1;
>> -}
>> -
>> -static void clear_memalloc(int memalloc)
>> -{
>> -	if (memalloc)
>> -		current->flags &= ~PF_MEMALLOC;
>> -}
>> -
>>  static ssize_t read_file(struct nandsim *ns, struct file *file, void *buf, size_t count, loff_t *pos)
>>  {
>>  	mm_segment_t old_fs;
>>  	ssize_t tx;
>> -	int err, memalloc;
>> +	int err;
>>  
>>  	err = get_pages(ns, file, count, *pos);
>>  	if (err)
>>  		return err;
>>  	old_fs = get_fs();
>>  	set_fs(get_ds());
>> -	memalloc = set_memalloc();
>>  	tx = vfs_read(file, (char __user *)buf, count, pos);
>> -	clear_memalloc(memalloc);
>>  	set_fs(old_fs);
>>  	put_pages(ns);
>>  	return tx;
>> @@ -1359,16 +1343,14 @@ static ssize_t write_file(struct nandsim *ns, struct file *file, void *buf, size
>>  {
>>  	mm_segment_t old_fs;
>>  	ssize_t tx;
>> -	int err, memalloc;
>> +	int err;
>>  
>>  	err = get_pages(ns, file, count, *pos);
>>  	if (err)
>>  		return err;
>>  	old_fs = get_fs();
>>  	set_fs(get_ds());
>> -	memalloc = set_memalloc();
>>  	tx = vfs_write(file, (char __user *)buf, count, pos);
>> -	clear_memalloc(memalloc);
>>  	set_fs(old_fs);
>>  	put_pages(ns);
>>  	return tx;PF_MEMALLOC,
> 
> I vaguely remember Adrian (CCed) did this on purpose. This is for the
> case when nandsim emulates NAND flash on top of a file. So there are 2
> file-systems involved: one sits on top of nandsim (e.g. UBIFS) and the
> other owns the file which nandsim uses (e.g., ext3).
> 
> And I really cannot remember off the top of my head why he needed
> PF_MEMALLOC, but I think Adrian wanted to prevent the direct reclaim
> path to re-enter, say UBIFS, and cause deadlock. But I'd thing that all
> the allocations in vfs_read()/vfs_write() should be GFP_NOFS, so that
> should not be a probelm?
> 

Yes it needs PF_MEMALLOC to prevent deadlock because there can be a
file system on top of nandsim which, in this case, is on top of another
file system.

I do not see how mempools will help here.

Please offer an alternative solution.

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 4/7] nandsim: Don't use PF_MEMALLOC
  2009-11-23 20:01       ` Adrian Hunter
  (?)
@ 2009-11-24 10:46         ` KOSAKI Motohiro
  -1 siblings, 0 replies; 108+ messages in thread
From: KOSAKI Motohiro @ 2009-11-24 10:46 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: kosaki.motohiro, Bityutskiy Artem (Nokia-D/Helsinki),
	LKML, linux-mm, Andrew Morton, David Woodhouse, linux-mtd

Hi

Thank you for this useful comments.

> > I vaguely remember Adrian (CCed) did this on purpose. This is for the
> > case when nandsim emulates NAND flash on top of a file. So there are 2
> > file-systems involved: one sits on top of nandsim (e.g. UBIFS) and the
> > other owns the file which nandsim uses (e.g., ext3).
> > 
> > And I really cannot remember off the top of my head why he needed
> > PF_MEMALLOC, but I think Adrian wanted to prevent the direct reclaim
> > path to re-enter, say UBIFS, and cause deadlock. But I'd thing that all
> > the allocations in vfs_read()/vfs_write() should be GFP_NOFS, so that
> > should not be a probelm?
> > 
> 
> Yes it needs PF_MEMALLOC to prevent deadlock because there can be a
> file system on top of nandsim which, in this case, is on top of another
> file system.
> 
> I do not see how mempools will help here.
> 
> Please offer an alternative solution.

I have few questions.

Can you please explain more detail? Another stackable filesystam
(e.g. ecryptfs) don't have such problem. Why nandsim have its issue?
What lock cause deadlock?




^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 4/7] nandsim: Don't use PF_MEMALLOC
@ 2009-11-24 10:46         ` KOSAKI Motohiro
  0 siblings, 0 replies; 108+ messages in thread
From: KOSAKI Motohiro @ 2009-11-24 10:46 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: kosaki.motohiro, Bityutskiy Artem (Nokia-D/Helsinki),
	LKML, linux-mm, Andrew Morton, David Woodhouse, linux-mtd

Hi

Thank you for this useful comments.

> > I vaguely remember Adrian (CCed) did this on purpose. This is for the
> > case when nandsim emulates NAND flash on top of a file. So there are 2
> > file-systems involved: one sits on top of nandsim (e.g. UBIFS) and the
> > other owns the file which nandsim uses (e.g., ext3).
> > 
> > And I really cannot remember off the top of my head why he needed
> > PF_MEMALLOC, but I think Adrian wanted to prevent the direct reclaim
> > path to re-enter, say UBIFS, and cause deadlock. But I'd thing that all
> > the allocations in vfs_read()/vfs_write() should be GFP_NOFS, so that
> > should not be a probelm?
> > 
> 
> Yes it needs PF_MEMALLOC to prevent deadlock because there can be a
> file system on top of nandsim which, in this case, is on top of another
> file system.
> 
> I do not see how mempools will help here.
> 
> Please offer an alternative solution.

I have few questions.

Can you please explain more detail? Another stackable filesystam
(e.g. ecryptfs) don't have such problem. Why nandsim have its issue?
What lock cause deadlock?



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 4/7] nandsim: Don't use PF_MEMALLOC
@ 2009-11-24 10:46         ` KOSAKI Motohiro
  0 siblings, 0 replies; 108+ messages in thread
From: KOSAKI Motohiro @ 2009-11-24 10:46 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Bityutskiy Artem (Nokia-D/Helsinki),
	David Woodhouse, LKML, linux-mm, linux-mtd, kosaki.motohiro,
	Andrew Morton

Hi

Thank you for this useful comments.

> > I vaguely remember Adrian (CCed) did this on purpose. This is for the
> > case when nandsim emulates NAND flash on top of a file. So there are 2
> > file-systems involved: one sits on top of nandsim (e.g. UBIFS) and the
> > other owns the file which nandsim uses (e.g., ext3).
> > 
> > And I really cannot remember off the top of my head why he needed
> > PF_MEMALLOC, but I think Adrian wanted to prevent the direct reclaim
> > path to re-enter, say UBIFS, and cause deadlock. But I'd thing that all
> > the allocations in vfs_read()/vfs_write() should be GFP_NOFS, so that
> > should not be a probelm?
> > 
> 
> Yes it needs PF_MEMALLOC to prevent deadlock because there can be a
> file system on top of nandsim which, in this case, is on top of another
> file system.
> 
> I do not see how mempools will help here.
> 
> Please offer an alternative solution.

I have few questions.

Can you please explain more detail? Another stackable filesystam
(e.g. ecryptfs) don't have such problem. Why nandsim have its issue?
What lock cause deadlock?

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 4/7] nandsim: Don't use PF_MEMALLOC
  2009-11-24 10:46         ` KOSAKI Motohiro
  (?)
@ 2009-11-24 11:56           ` Adrian Hunter
  -1 siblings, 0 replies; 108+ messages in thread
From: Adrian Hunter @ 2009-11-24 11:56 UTC (permalink / raw)
  To: ext KOSAKI Motohiro
  Cc: Bityutskiy Artem (Nokia-D/Helsinki),
	LKML, linux-mm, Andrew Morton, David Woodhouse, linux-mtd

ext KOSAKI Motohiro wrote:
> Hi
> 
> Thank you for this useful comments.
> 
>>> I vaguely remember Adrian (CCed) did this on purpose. This is for the
>>> case when nandsim emulates NAND flash on top of a file. So there are 2
>>> file-systems involved: one sits on top of nandsim (e.g. UBIFS) and the
>>> other owns the file which nandsim uses (e.g., ext3).
>>>
>>> And I really cannot remember off the top of my head why he needed
>>> PF_MEMALLOC, but I think Adrian wanted to prevent the direct reclaim
>>> path to re-enter, say UBIFS, and cause deadlock. But I'd thing that all
>>> the allocations in vfs_read()/vfs_write() should be GFP_NOFS, so that
>>> should not be a probelm?
>>>
>> Yes it needs PF_MEMALLOC to prevent deadlock because there can be a
>> file system on top of nandsim which, in this case, is on top of another
>> file system.
>>
>> I do not see how mempools will help here.
>>
>> Please offer an alternative solution.
> 
> I have few questions.
> 
> Can you please explain more detail? Another stackable filesystam
> (e.g. ecryptfs) don't have such problem. Why nandsim have its issue?
> What lock cause deadlock?
> 
> 
> 

The file systems are not stacked.  One is over nandsim, which nandsim
does not know about because it is just a lowly NAND device, and, with
the file cache option, one file system below to provide the file cache.

The deadlock is the kernel writing out dirty pages to the top file system
which writes to nandsim which writes to the bottom file system which
allocates memory which causes dirty pages to be written out to the top
file system, which tries to write to nandsim => deadlock.



^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 4/7] nandsim: Don't use PF_MEMALLOC
@ 2009-11-24 11:56           ` Adrian Hunter
  0 siblings, 0 replies; 108+ messages in thread
From: Adrian Hunter @ 2009-11-24 11:56 UTC (permalink / raw)
  To: ext KOSAKI Motohiro
  Cc: Bityutskiy Artem (Nokia-D/Helsinki),
	LKML, linux-mm, Andrew Morton, David Woodhouse, linux-mtd

ext KOSAKI Motohiro wrote:
> Hi
> 
> Thank you for this useful comments.
> 
>>> I vaguely remember Adrian (CCed) did this on purpose. This is for the
>>> case when nandsim emulates NAND flash on top of a file. So there are 2
>>> file-systems involved: one sits on top of nandsim (e.g. UBIFS) and the
>>> other owns the file which nandsim uses (e.g., ext3).
>>>
>>> And I really cannot remember off the top of my head why he needed
>>> PF_MEMALLOC, but I think Adrian wanted to prevent the direct reclaim
>>> path to re-enter, say UBIFS, and cause deadlock. But I'd thing that all
>>> the allocations in vfs_read()/vfs_write() should be GFP_NOFS, so that
>>> should not be a probelm?
>>>
>> Yes it needs PF_MEMALLOC to prevent deadlock because there can be a
>> file system on top of nandsim which, in this case, is on top of another
>> file system.
>>
>> I do not see how mempools will help here.
>>
>> Please offer an alternative solution.
> 
> I have few questions.
> 
> Can you please explain more detail? Another stackable filesystam
> (e.g. ecryptfs) don't have such problem. Why nandsim have its issue?
> What lock cause deadlock?
> 
> 
> 

The file systems are not stacked.  One is over nandsim, which nandsim
does not know about because it is just a lowly NAND device, and, with
the file cache option, one file system below to provide the file cache.

The deadlock is the kernel writing out dirty pages to the top file system
which writes to nandsim which writes to the bottom file system which
allocates memory which causes dirty pages to be written out to the top
file system, which tries to write to nandsim => deadlock.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 4/7] nandsim: Don't use PF_MEMALLOC
@ 2009-11-24 11:56           ` Adrian Hunter
  0 siblings, 0 replies; 108+ messages in thread
From: Adrian Hunter @ 2009-11-24 11:56 UTC (permalink / raw)
  To: ext KOSAKI Motohiro
  Cc: Bityutskiy Artem (Nokia-D/Helsinki),
	David Woodhouse, LKML, linux-mm, linux-mtd, Andrew Morton

ext KOSAKI Motohiro wrote:
> Hi
> 
> Thank you for this useful comments.
> 
>>> I vaguely remember Adrian (CCed) did this on purpose. This is for the
>>> case when nandsim emulates NAND flash on top of a file. So there are 2
>>> file-systems involved: one sits on top of nandsim (e.g. UBIFS) and the
>>> other owns the file which nandsim uses (e.g., ext3).
>>>
>>> And I really cannot remember off the top of my head why he needed
>>> PF_MEMALLOC, but I think Adrian wanted to prevent the direct reclaim
>>> path to re-enter, say UBIFS, and cause deadlock. But I'd thing that all
>>> the allocations in vfs_read()/vfs_write() should be GFP_NOFS, so that
>>> should not be a probelm?
>>>
>> Yes it needs PF_MEMALLOC to prevent deadlock because there can be a
>> file system on top of nandsim which, in this case, is on top of another
>> file system.
>>
>> I do not see how mempools will help here.
>>
>> Please offer an alternative solution.
> 
> I have few questions.
> 
> Can you please explain more detail? Another stackable filesystam
> (e.g. ecryptfs) don't have such problem. Why nandsim have its issue?
> What lock cause deadlock?
> 
> 
> 

The file systems are not stacked.  One is over nandsim, which nandsim
does not know about because it is just a lowly NAND device, and, with
the file cache option, one file system below to provide the file cache.

The deadlock is the kernel writing out dirty pages to the top file system
which writes to nandsim which writes to the bottom file system which
allocates memory which causes dirty pages to be written out to the top
file system, which tries to write to nandsim => deadlock.

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 4/7] nandsim: Don't use PF_MEMALLOC
  2009-11-24 11:56           ` Adrian Hunter
  (?)
@ 2009-11-25  0:42             ` KOSAKI Motohiro
  -1 siblings, 0 replies; 108+ messages in thread
From: KOSAKI Motohiro @ 2009-11-25  0:42 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: kosaki.motohiro, Bityutskiy Artem (Nokia-D/Helsinki),
	LKML, linux-mm, Andrew Morton, David Woodhouse, linux-mtd

> ext KOSAKI Motohiro wrote:
> > Hi
> > 
> > Thank you for this useful comments.
> > 
> >>> I vaguely remember Adrian (CCed) did this on purpose. This is for the
> >>> case when nandsim emulates NAND flash on top of a file. So there are 2
> >>> file-systems involved: one sits on top of nandsim (e.g. UBIFS) and the
> >>> other owns the file which nandsim uses (e.g., ext3).
> >>>
> >>> And I really cannot remember off the top of my head why he needed
> >>> PF_MEMALLOC, but I think Adrian wanted to prevent the direct reclaim
> >>> path to re-enter, say UBIFS, and cause deadlock. But I'd thing that all
> >>> the allocations in vfs_read()/vfs_write() should be GFP_NOFS, so that
> >>> should not be a probelm?
> >>>
> >> Yes it needs PF_MEMALLOC to prevent deadlock because there can be a
> >> file system on top of nandsim which, in this case, is on top of another
> >> file system.
> >>
> >> I do not see how mempools will help here.
> >>
> >> Please offer an alternative solution.
> > 
> > I have few questions.
> > 
> > Can you please explain more detail? Another stackable filesystam
> > (e.g. ecryptfs) don't have such problem. Why nandsim have its issue?
> > What lock cause deadlock?
> 
> The file systems are not stacked.  One is over nandsim, which nandsim
> does not know about because it is just a lowly NAND device, and, with
> the file cache option, one file system below to provide the file cache.
> 
> The deadlock is the kernel writing out dirty pages to the top file system
> which writes to nandsim which writes to the bottom file system which
> allocates memory which causes dirty pages to be written out to the top
> file system, which tries to write to nandsim => deadlock.

You mean you want to prevent pageout() instead reclaim itself?
Dropping filecache seems don't make recursive call, right?




^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 4/7] nandsim: Don't use PF_MEMALLOC
@ 2009-11-25  0:42             ` KOSAKI Motohiro
  0 siblings, 0 replies; 108+ messages in thread
From: KOSAKI Motohiro @ 2009-11-25  0:42 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: kosaki.motohiro, Bityutskiy Artem (Nokia-D/Helsinki),
	LKML, linux-mm, Andrew Morton, David Woodhouse, linux-mtd

> ext KOSAKI Motohiro wrote:
> > Hi
> > 
> > Thank you for this useful comments.
> > 
> >>> I vaguely remember Adrian (CCed) did this on purpose. This is for the
> >>> case when nandsim emulates NAND flash on top of a file. So there are 2
> >>> file-systems involved: one sits on top of nandsim (e.g. UBIFS) and the
> >>> other owns the file which nandsim uses (e.g., ext3).
> >>>
> >>> And I really cannot remember off the top of my head why he needed
> >>> PF_MEMALLOC, but I think Adrian wanted to prevent the direct reclaim
> >>> path to re-enter, say UBIFS, and cause deadlock. But I'd thing that all
> >>> the allocations in vfs_read()/vfs_write() should be GFP_NOFS, so that
> >>> should not be a probelm?
> >>>
> >> Yes it needs PF_MEMALLOC to prevent deadlock because there can be a
> >> file system on top of nandsim which, in this case, is on top of another
> >> file system.
> >>
> >> I do not see how mempools will help here.
> >>
> >> Please offer an alternative solution.
> > 
> > I have few questions.
> > 
> > Can you please explain more detail? Another stackable filesystam
> > (e.g. ecryptfs) don't have such problem. Why nandsim have its issue?
> > What lock cause deadlock?
> 
> The file systems are not stacked.  One is over nandsim, which nandsim
> does not know about because it is just a lowly NAND device, and, with
> the file cache option, one file system below to provide the file cache.
> 
> The deadlock is the kernel writing out dirty pages to the top file system
> which writes to nandsim which writes to the bottom file system which
> allocates memory which causes dirty pages to be written out to the top
> file system, which tries to write to nandsim => deadlock.

You mean you want to prevent pageout() instead reclaim itself?
Dropping filecache seems don't make recursive call, right?



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 4/7] nandsim: Don't use PF_MEMALLOC
@ 2009-11-25  0:42             ` KOSAKI Motohiro
  0 siblings, 0 replies; 108+ messages in thread
From: KOSAKI Motohiro @ 2009-11-25  0:42 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Bityutskiy Artem (Nokia-D/Helsinki),
	David Woodhouse, LKML, linux-mm, linux-mtd, kosaki.motohiro,
	Andrew Morton

> ext KOSAKI Motohiro wrote:
> > Hi
> > 
> > Thank you for this useful comments.
> > 
> >>> I vaguely remember Adrian (CCed) did this on purpose. This is for the
> >>> case when nandsim emulates NAND flash on top of a file. So there are 2
> >>> file-systems involved: one sits on top of nandsim (e.g. UBIFS) and the
> >>> other owns the file which nandsim uses (e.g., ext3).
> >>>
> >>> And I really cannot remember off the top of my head why he needed
> >>> PF_MEMALLOC, but I think Adrian wanted to prevent the direct reclaim
> >>> path to re-enter, say UBIFS, and cause deadlock. But I'd thing that all
> >>> the allocations in vfs_read()/vfs_write() should be GFP_NOFS, so that
> >>> should not be a probelm?
> >>>
> >> Yes it needs PF_MEMALLOC to prevent deadlock because there can be a
> >> file system on top of nandsim which, in this case, is on top of another
> >> file system.
> >>
> >> I do not see how mempools will help here.
> >>
> >> Please offer an alternative solution.
> > 
> > I have few questions.
> > 
> > Can you please explain more detail? Another stackable filesystam
> > (e.g. ecryptfs) don't have such problem. Why nandsim have its issue?
> > What lock cause deadlock?
> 
> The file systems are not stacked.  One is over nandsim, which nandsim
> does not know about because it is just a lowly NAND device, and, with
> the file cache option, one file system below to provide the file cache.
> 
> The deadlock is the kernel writing out dirty pages to the top file system
> which writes to nandsim which writes to the bottom file system which
> allocates memory which causes dirty pages to be written out to the top
> file system, which tries to write to nandsim => deadlock.

You mean you want to prevent pageout() instead reclaim itself?
Dropping filecache seems don't make recursive call, right?

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 4/7] nandsim: Don't use PF_MEMALLOC
  2009-11-25  0:42             ` KOSAKI Motohiro
  (?)
@ 2009-11-25  7:13               ` Adrian Hunter
  -1 siblings, 0 replies; 108+ messages in thread
From: Adrian Hunter @ 2009-11-25  7:13 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Bityutskiy Artem (Nokia-D/Helsinki),
	LKML, linux-mm, Andrew Morton, David Woodhouse, linux-mtd

KOSAKI Motohiro wrote:
>> KOSAKI Motohiro wrote:
>>> Hi
>>>
>>> Thank you for this useful comments.
>>>
>>>>> I vaguely remember Adrian (CCed) did this on purpose. This is for the
>>>>> case when nandsim emulates NAND flash on top of a file. So there are 2
>>>>> file-systems involved: one sits on top of nandsim (e.g. UBIFS) and the
>>>>> other owns the file which nandsim uses (e.g., ext3).
>>>>>
>>>>> And I really cannot remember off the top of my head why he needed
>>>>> PF_MEMALLOC, but I think Adrian wanted to prevent the direct reclaim
>>>>> path to re-enter, say UBIFS, and cause deadlock. But I'd thing that all
>>>>> the allocations in vfs_read()/vfs_write() should be GFP_NOFS, so that
>>>>> should not be a probelm?
>>>>>
>>>> Yes it needs PF_MEMALLOC to prevent deadlock because there can be a
>>>> file system on top of nandsim which, in this case, is on top of another
>>>> file system.
>>>>
>>>> I do not see how mempools will help here.
>>>>
>>>> Please offer an alternative solution.
>>> I have few questions.
>>>
>>> Can you please explain more detail? Another stackable filesystam
>>> (e.g. ecryptfs) don't have such problem. Why nandsim have its issue?
>>> What lock cause deadlock?
>> The file systems are not stacked.  One is over nandsim, which nandsim
>> does not know about because it is just a lowly NAND device, and, with
>> the file cache option, one file system below to provide the file cache.
>>
>> The deadlock is the kernel writing out dirty pages to the top file system
>> which writes to nandsim which writes to the bottom file system which
>> allocates memory which causes dirty pages to be written out to the top
>> file system, which tries to write to nandsim => deadlock.
> 
> You mean you want to prevent pageout() instead reclaim itself?

Yes

> Dropping filecache seems don't make recursive call, right?

Yes


^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 4/7] nandsim: Don't use PF_MEMALLOC
@ 2009-11-25  7:13               ` Adrian Hunter
  0 siblings, 0 replies; 108+ messages in thread
From: Adrian Hunter @ 2009-11-25  7:13 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Bityutskiy Artem (Nokia-D/Helsinki),
	LKML, linux-mm, Andrew Morton, David Woodhouse, linux-mtd

KOSAKI Motohiro wrote:
>> KOSAKI Motohiro wrote:
>>> Hi
>>>
>>> Thank you for this useful comments.
>>>
>>>>> I vaguely remember Adrian (CCed) did this on purpose. This is for the
>>>>> case when nandsim emulates NAND flash on top of a file. So there are 2
>>>>> file-systems involved: one sits on top of nandsim (e.g. UBIFS) and the
>>>>> other owns the file which nandsim uses (e.g., ext3).
>>>>>
>>>>> And I really cannot remember off the top of my head why he needed
>>>>> PF_MEMALLOC, but I think Adrian wanted to prevent the direct reclaim
>>>>> path to re-enter, say UBIFS, and cause deadlock. But I'd thing that all
>>>>> the allocations in vfs_read()/vfs_write() should be GFP_NOFS, so that
>>>>> should not be a probelm?
>>>>>
>>>> Yes it needs PF_MEMALLOC to prevent deadlock because there can be a
>>>> file system on top of nandsim which, in this case, is on top of another
>>>> file system.
>>>>
>>>> I do not see how mempools will help here.
>>>>
>>>> Please offer an alternative solution.
>>> I have few questions.
>>>
>>> Can you please explain more detail? Another stackable filesystam
>>> (e.g. ecryptfs) don't have such problem. Why nandsim have its issue?
>>> What lock cause deadlock?
>> The file systems are not stacked.  One is over nandsim, which nandsim
>> does not know about because it is just a lowly NAND device, and, with
>> the file cache option, one file system below to provide the file cache.
>>
>> The deadlock is the kernel writing out dirty pages to the top file system
>> which writes to nandsim which writes to the bottom file system which
>> allocates memory which causes dirty pages to be written out to the top
>> file system, which tries to write to nandsim => deadlock.
> 
> You mean you want to prevent pageout() instead reclaim itself?

Yes

> Dropping filecache seems don't make recursive call, right?

Yes

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 4/7] nandsim: Don't use PF_MEMALLOC
@ 2009-11-25  7:13               ` Adrian Hunter
  0 siblings, 0 replies; 108+ messages in thread
From: Adrian Hunter @ 2009-11-25  7:13 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Bityutskiy Artem (Nokia-D/Helsinki),
	David Woodhouse, LKML, linux-mm, linux-mtd, Andrew Morton

KOSAKI Motohiro wrote:
>> KOSAKI Motohiro wrote:
>>> Hi
>>>
>>> Thank you for this useful comments.
>>>
>>>>> I vaguely remember Adrian (CCed) did this on purpose. This is for the
>>>>> case when nandsim emulates NAND flash on top of a file. So there are 2
>>>>> file-systems involved: one sits on top of nandsim (e.g. UBIFS) and the
>>>>> other owns the file which nandsim uses (e.g., ext3).
>>>>>
>>>>> And I really cannot remember off the top of my head why he needed
>>>>> PF_MEMALLOC, but I think Adrian wanted to prevent the direct reclaim
>>>>> path to re-enter, say UBIFS, and cause deadlock. But I'd thing that all
>>>>> the allocations in vfs_read()/vfs_write() should be GFP_NOFS, so that
>>>>> should not be a probelm?
>>>>>
>>>> Yes it needs PF_MEMALLOC to prevent deadlock because there can be a
>>>> file system on top of nandsim which, in this case, is on top of another
>>>> file system.
>>>>
>>>> I do not see how mempools will help here.
>>>>
>>>> Please offer an alternative solution.
>>> I have few questions.
>>>
>>> Can you please explain more detail? Another stackable filesystam
>>> (e.g. ecryptfs) don't have such problem. Why nandsim have its issue?
>>> What lock cause deadlock?
>> The file systems are not stacked.  One is over nandsim, which nandsim
>> does not know about because it is just a lowly NAND device, and, with
>> the file cache option, one file system below to provide the file cache.
>>
>> The deadlock is the kernel writing out dirty pages to the top file system
>> which writes to nandsim which writes to the bottom file system which
>> allocates memory which causes dirty pages to be written out to the top
>> file system, which tries to write to nandsim => deadlock.
> 
> You mean you want to prevent pageout() instead reclaim itself?

Yes

> Dropping filecache seems don't make recursive call, right?

Yes

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 4/7] nandsim: Don't use PF_MEMALLOC
  2009-11-25  7:13               ` Adrian Hunter
  (?)
@ 2009-11-25  7:18                 ` KOSAKI Motohiro
  -1 siblings, 0 replies; 108+ messages in thread
From: KOSAKI Motohiro @ 2009-11-25  7:18 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: kosaki.motohiro, Bityutskiy Artem (Nokia-D/Helsinki),
	LKML, linux-mm, Andrew Morton, David Woodhouse, linux-mtd

> KOSAKI Motohiro wrote:
> >> KOSAKI Motohiro wrote:
> >>> Hi
> >>>
> >>> Thank you for this useful comments.
> >>>
> >>>>> I vaguely remember Adrian (CCed) did this on purpose. This is for the
> >>>>> case when nandsim emulates NAND flash on top of a file. So there are 2
> >>>>> file-systems involved: one sits on top of nandsim (e.g. UBIFS) and the
> >>>>> other owns the file which nandsim uses (e.g., ext3).
> >>>>>
> >>>>> And I really cannot remember off the top of my head why he needed
> >>>>> PF_MEMALLOC, but I think Adrian wanted to prevent the direct reclaim
> >>>>> path to re-enter, say UBIFS, and cause deadlock. But I'd thing that all
> >>>>> the allocations in vfs_read()/vfs_write() should be GFP_NOFS, so that
> >>>>> should not be a probelm?
> >>>>>
> >>>> Yes it needs PF_MEMALLOC to prevent deadlock because there can be a
> >>>> file system on top of nandsim which, in this case, is on top of another
> >>>> file system.
> >>>>
> >>>> I do not see how mempools will help here.
> >>>>
> >>>> Please offer an alternative solution.
> >>> I have few questions.
> >>>
> >>> Can you please explain more detail? Another stackable filesystam
> >>> (e.g. ecryptfs) don't have such problem. Why nandsim have its issue?
> >>> What lock cause deadlock?
> >> The file systems are not stacked.  One is over nandsim, which nandsim
> >> does not know about because it is just a lowly NAND device, and, with
> >> the file cache option, one file system below to provide the file cache.
> >>
> >> The deadlock is the kernel writing out dirty pages to the top file system
> >> which writes to nandsim which writes to the bottom file system which
> >> allocates memory which causes dirty pages to be written out to the top
> >> file system, which tries to write to nandsim => deadlock.
> > 
> > You mean you want to prevent pageout() instead reclaim itself?
> 
> Yes
> 
> > Dropping filecache seems don't make recursive call, right?
> 
> Yes

o.k.

I really think the cache dropping shuoldn't be prevented because
typical linux box have lots droppable file cache and very few free pages.
but prevent pageout() seems not so problematic.

Thank you for good information.



^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 4/7] nandsim: Don't use PF_MEMALLOC
@ 2009-11-25  7:18                 ` KOSAKI Motohiro
  0 siblings, 0 replies; 108+ messages in thread
From: KOSAKI Motohiro @ 2009-11-25  7:18 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: kosaki.motohiro, Bityutskiy Artem (Nokia-D/Helsinki),
	LKML, linux-mm, Andrew Morton, David Woodhouse, linux-mtd

> KOSAKI Motohiro wrote:
> >> KOSAKI Motohiro wrote:
> >>> Hi
> >>>
> >>> Thank you for this useful comments.
> >>>
> >>>>> I vaguely remember Adrian (CCed) did this on purpose. This is for the
> >>>>> case when nandsim emulates NAND flash on top of a file. So there are 2
> >>>>> file-systems involved: one sits on top of nandsim (e.g. UBIFS) and the
> >>>>> other owns the file which nandsim uses (e.g., ext3).
> >>>>>
> >>>>> And I really cannot remember off the top of my head why he needed
> >>>>> PF_MEMALLOC, but I think Adrian wanted to prevent the direct reclaim
> >>>>> path to re-enter, say UBIFS, and cause deadlock. But I'd thing that all
> >>>>> the allocations in vfs_read()/vfs_write() should be GFP_NOFS, so that
> >>>>> should not be a probelm?
> >>>>>
> >>>> Yes it needs PF_MEMALLOC to prevent deadlock because there can be a
> >>>> file system on top of nandsim which, in this case, is on top of another
> >>>> file system.
> >>>>
> >>>> I do not see how mempools will help here.
> >>>>
> >>>> Please offer an alternative solution.
> >>> I have few questions.
> >>>
> >>> Can you please explain more detail? Another stackable filesystam
> >>> (e.g. ecryptfs) don't have such problem. Why nandsim have its issue?
> >>> What lock cause deadlock?
> >> The file systems are not stacked.  One is over nandsim, which nandsim
> >> does not know about because it is just a lowly NAND device, and, with
> >> the file cache option, one file system below to provide the file cache.
> >>
> >> The deadlock is the kernel writing out dirty pages to the top file system
> >> which writes to nandsim which writes to the bottom file system which
> >> allocates memory which causes dirty pages to be written out to the top
> >> file system, which tries to write to nandsim => deadlock.
> > 
> > You mean you want to prevent pageout() instead reclaim itself?
> 
> Yes
> 
> > Dropping filecache seems don't make recursive call, right?
> 
> Yes

o.k.

I really think the cache dropping shuoldn't be prevented because
typical linux box have lots droppable file cache and very few free pages.
but prevent pageout() seems not so problematic.

Thank you for good information.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 4/7] nandsim: Don't use PF_MEMALLOC
@ 2009-11-25  7:18                 ` KOSAKI Motohiro
  0 siblings, 0 replies; 108+ messages in thread
From: KOSAKI Motohiro @ 2009-11-25  7:18 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Bityutskiy Artem (Nokia-D/Helsinki),
	David Woodhouse, LKML, linux-mm, linux-mtd, kosaki.motohiro,
	Andrew Morton

> KOSAKI Motohiro wrote:
> >> KOSAKI Motohiro wrote:
> >>> Hi
> >>>
> >>> Thank you for this useful comments.
> >>>
> >>>>> I vaguely remember Adrian (CCed) did this on purpose. This is for the
> >>>>> case when nandsim emulates NAND flash on top of a file. So there are 2
> >>>>> file-systems involved: one sits on top of nandsim (e.g. UBIFS) and the
> >>>>> other owns the file which nandsim uses (e.g., ext3).
> >>>>>
> >>>>> And I really cannot remember off the top of my head why he needed
> >>>>> PF_MEMALLOC, but I think Adrian wanted to prevent the direct reclaim
> >>>>> path to re-enter, say UBIFS, and cause deadlock. But I'd thing that all
> >>>>> the allocations in vfs_read()/vfs_write() should be GFP_NOFS, so that
> >>>>> should not be a probelm?
> >>>>>
> >>>> Yes it needs PF_MEMALLOC to prevent deadlock because there can be a
> >>>> file system on top of nandsim which, in this case, is on top of another
> >>>> file system.
> >>>>
> >>>> I do not see how mempools will help here.
> >>>>
> >>>> Please offer an alternative solution.
> >>> I have few questions.
> >>>
> >>> Can you please explain more detail? Another stackable filesystam
> >>> (e.g. ecryptfs) don't have such problem. Why nandsim have its issue?
> >>> What lock cause deadlock?
> >> The file systems are not stacked.  One is over nandsim, which nandsim
> >> does not know about because it is just a lowly NAND device, and, with
> >> the file cache option, one file system below to provide the file cache.
> >>
> >> The deadlock is the kernel writing out dirty pages to the top file system
> >> which writes to nandsim which writes to the bottom file system which
> >> allocates memory which causes dirty pages to be written out to the top
> >> file system, which tries to write to nandsim => deadlock.
> > 
> > You mean you want to prevent pageout() instead reclaim itself?
> 
> Yes
> 
> > Dropping filecache seems don't make recursive call, right?
> 
> Yes

o.k.

I really think the cache dropping shuoldn't be prevented because
typical linux box have lots droppable file cache and very few free pages.
but prevent pageout() seems not so problematic.

Thank you for good information.

^ permalink raw reply	[flat|nested] 108+ messages in thread

end of thread, other threads:[~2009-11-25  7:19 UTC | newest]

Thread overview: 108+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-11-17  7:16 [PATCH 0/7] Kill PF_MEMALLOC abuse KOSAKI Motohiro
2009-11-17  7:16 ` KOSAKI Motohiro
2009-11-17  7:17 ` [PATCH 1/7] dm: use __GFP_HIGH instead PF_MEMALLOC KOSAKI Motohiro
2009-11-17  7:17   ` KOSAKI Motohiro
2009-11-17 13:15   ` Alasdair G Kergon
2009-11-17 13:15     ` Alasdair G Kergon
2009-11-17 13:15     ` Alasdair G Kergon
2009-11-18  6:17     ` KOSAKI Motohiro
2009-11-18  6:17     ` KOSAKI Motohiro
2009-11-18  6:17       ` KOSAKI Motohiro
2009-11-17  7:17 ` [PATCH 2/7] mmc: Don't use PF_MEMALLOC KOSAKI Motohiro
2009-11-17  7:17   ` KOSAKI Motohiro
2009-11-17 10:29   ` Alan Cox
2009-11-17 10:29     ` Alan Cox
2009-11-17 10:29     ` Alan Cox
2009-11-17 10:32     ` Minchan Kim
2009-11-17 10:32       ` Minchan Kim
2009-11-17 10:38       ` Oliver Neukum
2009-11-17 10:38         ` Oliver Neukum
2009-11-17 11:58     ` KOSAKI Motohiro
2009-11-17 11:58       ` KOSAKI Motohiro
2009-11-17 12:51       ` Minchan Kim
2009-11-17 12:51         ` Minchan Kim
2009-11-17 20:47         ` Peter Zijlstra
2009-11-17 20:47           ` Peter Zijlstra
2009-11-18  0:01           ` Minchan Kim
2009-11-18  0:01             ` Minchan Kim
2009-11-18  9:56             ` Peter Zijlstra
2009-11-18  9:56               ` Peter Zijlstra
2009-11-18 10:31               ` Minchan Kim
2009-11-18 10:31                 ` Minchan Kim
2009-11-18 10:54                 ` Peter Zijlstra
2009-11-18 10:54                   ` Peter Zijlstra
2009-11-18 11:15                   ` Minchan Kim
2009-11-18 11:15                     ` Minchan Kim
2009-11-17  7:18 ` [PATCH 3/7] mtd: " KOSAKI Motohiro
2009-11-17  7:18   ` KOSAKI Motohiro
2009-11-17  7:18   ` KOSAKI Motohiro
2009-11-17  7:19 ` [PATCH 4/7] nandsim: " KOSAKI Motohiro
2009-11-17  7:19   ` KOSAKI Motohiro
2009-11-17  7:19   ` KOSAKI Motohiro
2009-11-23 15:00   ` Artem Bityutskiy
2009-11-23 15:00     ` Artem Bityutskiy
2009-11-23 15:00     ` Artem Bityutskiy
2009-11-23 20:01     ` Adrian Hunter
2009-11-23 20:01       ` Adrian Hunter
2009-11-23 20:01       ` Adrian Hunter
2009-11-24 10:46       ` KOSAKI Motohiro
2009-11-24 10:46         ` KOSAKI Motohiro
2009-11-24 10:46         ` KOSAKI Motohiro
2009-11-24 11:56         ` Adrian Hunter
2009-11-24 11:56           ` Adrian Hunter
2009-11-24 11:56           ` Adrian Hunter
2009-11-25  0:42           ` KOSAKI Motohiro
2009-11-25  0:42             ` KOSAKI Motohiro
2009-11-25  0:42             ` KOSAKI Motohiro
2009-11-25  7:13             ` Adrian Hunter
2009-11-25  7:13               ` Adrian Hunter
2009-11-25  7:13               ` Adrian Hunter
2009-11-25  7:18               ` KOSAKI Motohiro
2009-11-25  7:18                 ` KOSAKI Motohiro
2009-11-25  7:18                 ` KOSAKI Motohiro
2009-11-17  7:21 ` [PATCH 5/7] Revert "Intel IOMMU: Avoid memory allocation failures in dma map api calls" KOSAKI Motohiro
2009-11-17  7:21   ` KOSAKI Motohiro
2009-11-17  7:22 ` [PATCH 6/7] cifs: Don't use PF_MEMALLOC KOSAKI Motohiro
2009-11-17  7:22   ` KOSAKI Motohiro
2009-11-17  7:32   ` [PATCH] Mark cifs mailing list as "moderated as non-subscribers" KOSAKI Motohiro
2009-11-17  7:32     ` KOSAKI Motohiro
2009-11-17 12:47   ` [PATCH 6/7] cifs: Don't use PF_MEMALLOC Jeff Layton
2009-11-17 12:47     ` Jeff Layton
2009-11-17 16:40     ` Steve French
2009-11-18  6:31       ` KOSAKI Motohiro
2009-11-18  6:31         ` KOSAKI Motohiro
2009-11-17 16:41     ` Steve French
2009-11-17  7:23 ` [PATCH 7/7] xfs: " KOSAKI Motohiro
2009-11-17  7:23   ` KOSAKI Motohiro
2009-11-17  7:23   ` KOSAKI Motohiro
2009-11-17 22:11   ` Dave Chinner
2009-11-17 22:11     ` Dave Chinner
2009-11-17 22:11     ` Dave Chinner
2009-11-17 22:11     ` Dave Chinner
2009-11-18  8:56     ` KOSAKI Motohiro
2009-11-18  8:56       ` KOSAKI Motohiro
2009-11-18  8:56       ` KOSAKI Motohiro
2009-11-18  8:56       ` KOSAKI Motohiro
2009-11-18 22:16       ` Dave Chinner
2009-11-18 22:16         ` Dave Chinner
2009-11-18 22:16         ` Dave Chinner
2009-11-17  8:07 ` [PATCH 0/7] Kill PF_MEMALLOC abuse David Rientjes
2009-11-17  8:07   ` David Rientjes
2009-11-17  8:33   ` KOSAKI Motohiro
2009-11-17  8:33     ` KOSAKI Motohiro
2009-11-17  8:36     ` David Rientjes
2009-11-17  8:36       ` David Rientjes
2009-11-17 20:56     ` Peter Zijlstra
2009-11-17 20:56       ` Peter Zijlstra
2009-11-18  5:55       ` KOSAKI Motohiro
2009-11-18  5:55         ` KOSAKI Motohiro
2009-11-17 10:15 ` Christoph Hellwig
2009-11-17 10:15   ` Christoph Hellwig
2009-11-17 10:24   ` KOSAKI Motohiro
2009-11-17 10:24     ` KOSAKI Motohiro
2009-11-17 10:27     ` Christoph Hellwig
2009-11-17 10:27       ` Christoph Hellwig
2009-11-17 12:24       ` KOSAKI Motohiro
2009-11-17 12:24         ` KOSAKI Motohiro
2009-11-17 12:47         ` Christoph Hellwig
2009-11-17 12:47           ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.