[PATCH 1/2] mempool: drop unnecessary and incorrect BUG_ON() from mempool

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 1/2] mempool: drop unnecessary and incorrect BUG_ON() from mempool_destroy()
@ 2011-12-22  0:18 Tejun Heo
  2011-12-22  0:19 ` [PATCH 2/2] mempool: fix first round failure behavior Tejun Heo
  2011-12-22  0:25 ` [PATCH 1/2] mempool: drop unnecessary and incorrect BUG_ON() from mempool_destroy() Andrew Morton
  0 siblings, 2 replies; 17+ messages in thread
From: Tejun Heo @ 2011-12-22  0:18 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Linus Torvalds, linux-kernel

mempool_destroy() is a thin wrapper around free_pool().  The only
thing it adds is BUG_ON(pool->curr_nr != pool->min_nr).  The intention
seems to be to enforce that all allocated elements are freed; however,
the BUG_ON() can't achieve that (it doesn't know anything about
objects above min_nr) and incorrect as mempool_resize() is allowed to
leave the pool extended but not filled.  Furthermore, panicking is way
worse than any memory leak and there are better debug tools to track
memory leaks.

Drop the BUG_ON() from mempool_destory() and as that leaves the
function identical to free_pool(), replace it.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: stable@kernel.org
---
These patches are on top of "mempool: fix and document synchronization
and memory barrier usage" patch[1].  Both are fixes and it probably is
a good idea to forward to -stable.

[1] http://thread.gmane.org/gmane.linux.kernel/1231609/focus=1232127

 mm/mempool.c |   30 +++++++++++-------------------
 1 file changed, 11 insertions(+), 19 deletions(-)

Index: work/mm/mempool.c
===================================================================
--- work.orig/mm/mempool.c
+++ work/mm/mempool.c
@@ -27,7 +27,15 @@ static void *remove_element(mempool_t *p
 	return pool->elements[--pool->curr_nr];
 }
 
-static void free_pool(mempool_t *pool)
+/**
+ * mempool_destroy - deallocate a memory pool
+ * @pool:      pointer to the memory pool which was allocated via
+ *             mempool_create().
+ *
+ * Free all reserved elements in @pool and @pool itself.  This function
+ * only sleeps if the free_fn() function sleeps.
+ */
+void mempool_destroy(mempool_t *pool)
 {
 	while (pool->curr_nr) {
 		void *element = remove_element(pool);
@@ -36,6 +44,7 @@ static void free_pool(mempool_t *pool)
 	kfree(pool->elements);
 	kfree(pool);
 }
+EXPORT_SYMBOL(mempool_destroy);
 
 /**
  * mempool_create - create a memory pool
@@ -86,7 +95,7 @@ mempool_t *mempool_create_node(int min_n
 
 		element = pool->alloc(GFP_KERNEL, pool->pool_data);
 		if (unlikely(!element)) {
-			free_pool(pool);
+			mempool_destroy(pool);
 			return NULL;
 		}
 		add_element(pool, element);
@@ -172,23 +181,6 @@ out:
 EXPORT_SYMBOL(mempool_resize);
 
 /**
- * mempool_destroy - deallocate a memory pool
- * @pool:      pointer to the memory pool which was allocated via
- *             mempool_create().
- *
- * this function only sleeps if the free_fn() function sleeps. The caller
- * has to guarantee that all elements have been returned to the pool (ie:
- * freed) prior to calling mempool_destroy().
- */
-void mempool_destroy(mempool_t *pool)
-{
-	/* Check for outstanding elements */
-	BUG_ON(pool->curr_nr != pool->min_nr);
-	free_pool(pool);
-}
-EXPORT_SYMBOL(mempool_destroy);
-
-/**
  * mempool_alloc - allocate an element from a specific memory pool
  * @pool:      pointer to the memory pool which was allocated via
  *             mempool_create().

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 2/2] mempool: fix first round failure behavior
  2011-12-22  0:18 [PATCH 1/2] mempool: drop unnecessary and incorrect BUG_ON() from mempool_destroy() Tejun Heo
@ 2011-12-22  0:19 ` Tejun Heo
  2011-12-22  0:32   ` Andrew Morton
  2011-12-22  0:46   ` [PATCH UPDATED " Tejun Heo
  2011-12-22  0:25 ` [PATCH 1/2] mempool: drop unnecessary and incorrect BUG_ON() from mempool_destroy() Andrew Morton
  1 sibling, 2 replies; 17+ messages in thread
From: Tejun Heo @ 2011-12-22  0:19 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Linus Torvalds, linux-kernel

For the initial allocation, mempool passes modified gfp mask to the
backing allocator so that it doesn't try too hard when there are
reserved elements waiting in the pool; however, when that allocation
fails and pool is empty too, it either waits for the pool to be
replenished before retrying or fails if !__GFP_WAIT.

* If the caller was calling in with GFP_ATOMIC, it never gets to try
  emergency reserve.  Allocations which would have succeeded without
  mempool may fail, which is just wrong.

* Allocation which could have succeeded after a bit of reclaim now has
  to wait on the reserved items and it's not like mempool doesn't
  retry with the original gfp mask.  It just does that *after* someone
  returns an element, pointlessly delaying things.

Fix it by retrying immediately with the gfp mask requested by the
caller if the first round of allocation attempts fails with modified
mask.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: stable@kernel.org
---
 mm/mempool.c |   14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

Index: work/mm/mempool.c
===================================================================
--- work.orig/mm/mempool.c
+++ work/mm/mempool.c
@@ -221,14 +221,24 @@ repeat_alloc:
 		return element;
 	}

-	/* We must not sleep in the GFP_ATOMIC case */
+	/*
+	 * We use modified gfp mask for the first round.  If alloc failed
+	 * with that and @pool was empty too, immediately retry with the
+	 * original gfp mask.
+	 */
+	if (gfp_temp != gfp_mask) {
+		gfp_temp = gfp_mask;
+		spin_unlock_irqrestore(&pool->lock, flags);
+		goto repeat_alloc;
+	}
+
+	/* We must not sleep if !__GFP_WAIT */
 	if (!(gfp_mask & __GFP_WAIT)) {
 		spin_unlock_irqrestore(&pool->lock, flags);
 		return NULL;
 	}

 	/* Let's wait for someone else to return an element to @pool */
-	gfp_temp = gfp_mask;
 	init_wait(&wait);
 	prepare_to_wait(&pool->wait, &wait, TASK_UNINTERRUPTIBLE);

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] mempool: drop unnecessary and incorrect BUG_ON() from mempool_destroy()
  2011-12-22  0:18 [PATCH 1/2] mempool: drop unnecessary and incorrect BUG_ON() from mempool_destroy() Tejun Heo
  2011-12-22  0:19 ` [PATCH 2/2] mempool: fix first round failure behavior Tejun Heo
@ 2011-12-22  0:25 ` Andrew Morton
  2011-12-22  0:35   ` Tejun Heo
  1 sibling, 1 reply; 17+ messages in thread
From: Andrew Morton @ 2011-12-22  0:25 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Linus Torvalds, linux-kernel

On Wed, 21 Dec 2011 16:18:00 -0800
Tejun Heo <tj@kernel.org> wrote:

> mempool_destroy() is a thin wrapper around free_pool().  The only
> thing it adds is BUG_ON(pool->curr_nr != pool->min_nr).  The intention
> seems to be to enforce that all allocated elements are freed; however,
> the BUG_ON() can't achieve that (it doesn't know anything about
> objects above min_nr) and incorrect as mempool_resize() is allowed to
> leave the pool extended but not filled.  Furthermore, panicking is way
> worse than any memory leak and there are better debug tools to track
> memory leaks.
> 
> Drop the BUG_ON() from mempool_destory() and as that leaves the
> function identical to free_pool(), replace it.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: stable@kernel.org

(that's stable@vger.kernel.org)

> ---
> These patches are on top of "mempool: fix and document synchronization
> and memory barrier usage" patch[1].  Both are fixes and it probably is
> a good idea to forward to -stable.

I'm not sure that either of these are suitable for -stable.  There's no
demonstrated problem, nor even a likely theoretical one, is there?

If we do decide to backport, I don't think the -stable guys will want
the large-but-nice comment-adding patch so both these patches would need to
be reworked for -stable usage.  The first patch does apply successfully
to mainline.  The second does not.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] mempool: fix first round failure behavior
  2011-12-22  0:19 ` [PATCH 2/2] mempool: fix first round failure behavior Tejun Heo
@ 2011-12-22  0:32   ` Andrew Morton
  2011-12-22  0:34     ` Tejun Heo
  2011-12-22  0:46   ` [PATCH UPDATED " Tejun Heo
  1 sibling, 1 reply; 17+ messages in thread
From: Andrew Morton @ 2011-12-22  0:32 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Linus Torvalds, linux-kernel

On Wed, 21 Dec 2011 16:19:39 -0800
Tejun Heo <tj@kernel.org> wrote:

> For the initial allocation, mempool passes modified gfp mask to the
> backing allocator so that it doesn't try too hard when there are
> reserved elements waiting in the pool; however, when that allocation
> fails and pool is empty too, it either waits for the pool to be
> replenished before retrying or fails if !__GFP_WAIT.
> 
> * If the caller was calling in with GFP_ATOMIC, it never gets to try
>   emergency reserve.  Allocations which would have succeeded without
>   mempool may fail, which is just wrong.
> 
> * Allocation which could have succeeded after a bit of reclaim now has
>   to wait on the reserved items and it's not like mempool doesn't
>   retry with the original gfp mask.  It just does that *after* someone
>   returns an element, pointlessly delaying things.

This is a significant change in behaviour.  Previously the mempool code
would preserve emergency pools while waiting for someone to return an
item.  Now, it will permit many more items to be allocated, chewing
into the emergency pools.

We *know* that items will soon become available, so why not wait for
that to happen rather than consuming memory which less robust callers
could have utilised?

IOW, this change appears to make the kernel more vulnerable to memory
exhaustion failures?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] mempool: fix first round failure behavior
  2011-12-22  0:32   ` Andrew Morton
@ 2011-12-22  0:34     ` Tejun Heo
  0 siblings, 0 replies; 17+ messages in thread
From: Tejun Heo @ 2011-12-22  0:34 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Linus Torvalds, linux-kernel

Hello, Andrew.

On Wed, Dec 21, 2011 at 4:32 PM, Andrew Morton
<akpm@linux-foundation.org> wrote:
> This is a significant change in behaviour.  Previously the mempool code
> would preserve emergency pools while waiting for someone to return an
> item.  Now, it will permit many more items to be allocated, chewing
> into the emergency pools.
>
> We *know* that items will soon become available, so why not wait for
> that to happen rather than consuming memory which less robust callers
> could have utilised?
>
> IOW, this change appears to make the kernel more vulnerable to memory
> exhaustion failures?

Yeah, that's me misreading the code.  mempool allocator doesn't retry
with the original gfp mask.  It does with umm... less modified one, so
the patch description is wrong but the behavior w.r.t. emergency pool
doesn't change. I'm rewriting the commit message.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] mempool: drop unnecessary and incorrect BUG_ON() from mempool_destroy()
  2011-12-22  0:25 ` [PATCH 1/2] mempool: drop unnecessary and incorrect BUG_ON() from mempool_destroy() Andrew Morton
@ 2011-12-22  0:35   ` Tejun Heo
  2011-12-22  0:40     ` Greg KH
  0 siblings, 1 reply; 17+ messages in thread
From: Tejun Heo @ 2011-12-22  0:35 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Linus Torvalds, linux-kernel, gregkh

Hello,

On Wed, Dec 21, 2011 at 04:25:19PM -0800, Andrew Morton wrote:
> > Signed-off-by: Tejun Heo <tj@kernel.org>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: stable@kernel.org
> 
> (that's stable@vger.kernel.org)

(cc'ing Greg)

It has been stable@kernel.org for quite a while and Greg scans for
that Cc.  Even MAINTAINERS has that as the official mail address.  I
heard that the mailing alias is broken at the moment but wouldn't it
be better to fix that?

> > ---
> > These patches are on top of "mempool: fix and document synchronization
> > and memory barrier usage" patch[1].  Both are fixes and it probably is
> > a good idea to forward to -stable.
> 
> I'm not sure that either of these are suitable for -stable.  There's no
> demonstrated problem, nor even a likely theoretical one, is there?
> 
> If we do decide to backport, I don't think the -stable guys will want
> the large-but-nice comment-adding patch so both these patches would need to
> be reworked for -stable usage.  The first patch does apply successfully
> to mainline.  The second does not.

Hmmm... I think it should be possible to trip the BUG_ON() removed by
the first patch with targeted enough attack but AFAICS that would
require root priv so it might not be too bad.  The second one, while
not optimal, shouldn't be critical.  BTW, I missed sth in the second
patch, will soon post an updated one.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] mempool: drop unnecessary and incorrect BUG_ON() from mempool_destroy()
  2011-12-22  0:35   ` Tejun Heo
@ 2011-12-22  0:40     ` Greg KH
  0 siblings, 0 replies; 17+ messages in thread
From: Greg KH @ 2011-12-22  0:40 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Andrew Morton, Linus Torvalds, linux-kernel

On Wed, Dec 21, 2011 at 04:35:07PM -0800, Tejun Heo wrote:
> Hello,
> 
> On Wed, Dec 21, 2011 at 04:25:19PM -0800, Andrew Morton wrote:
> > > Signed-off-by: Tejun Heo <tj@kernel.org>
> > > Cc: Andrew Morton <akpm@linux-foundation.org>
> > > Cc: stable@kernel.org
> > 
> > (that's stable@vger.kernel.org)
> 
> (cc'ing Greg)
> 
> It has been stable@kernel.org for quite a while and Greg scans for
> that Cc.  Even MAINTAINERS has that as the official mail address.  I
> heard that the mailing alias is broken at the moment but wouldn't it
> be better to fix that?

I have a patch in one my trees to update the MAINTAINERS file, which
will get backported to the older kernels as well, it must be queued for
3.3, as it's not really a big deal.

And the alias is being worked on, sometimes it works, others it doesn't<
I'm not quite sure what is going on.

But either way, I will catch patches with cc: stable@vger.kernel.org or
stable@kernel.org when they hit Linus's tree, which is the important
thing, as that's the "real" path to get patches into stable releases.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH UPDATED 2/2] mempool: fix first round failure behavior
  2011-12-22  0:19 ` [PATCH 2/2] mempool: fix first round failure behavior Tejun Heo
  2011-12-22  0:32   ` Andrew Morton
@ 2011-12-22  0:46   ` Tejun Heo
  2011-12-22  1:09     ` Andrew Morton
  1 sibling, 1 reply; 17+ messages in thread
From: Tejun Heo @ 2011-12-22  0:46 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Linus Torvalds, linux-kernel

mempool modifies gfp_mask so that the backing allocator doesn't try
too hard or trigger warning message when there's pool to fall back on.
In addition, for the first try, it removes __GFP_WAIT and IO, so that
it doesn't trigger reclaim or wait when allocation can be fulfilled
from pool; however, when that allocation fails and pool is empty too,
it waits for the pool to be replenished before retrying.

Allocation which could have succeeded after a bit of reclaim has to
wait on the reserved items and it's not like mempool doesn't retry
with __GFP_WAIT and IO.  It just does that *after* someone returns an
element, pointlessly delaying things.

Fix it by retrying immediately if the first round of allocation
attempts w/o __GFP_WAIT and IO fails.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
---
The code hasn't changed.  Only the description and comment are
updated.  It doesn't affect anything regarding emergency pool.  It
just changes when the first retry happens.

That said, I still find it a bit unsettling that a GFP_ATOMIC
allocation which would otherwise succeed may fail when issued through
mempool.  Maybe the RTTD is clearing __GFP_NOMEMALLOC on retry if the
gfp requsted by the caller is !__GFP_WAIT && !__GFP_NOMEMALLOC?

Thanks.

 mm/mempool.c |   13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

Index: work/mm/mempool.c
===================================================================
--- work.orig/mm/mempool.c
+++ work/mm/mempool.c
@@ -221,14 +221,23 @@ repeat_alloc:
 		return element;
 	}

-	/* We must not sleep in the GFP_ATOMIC case */
+	/*
+	 * We use gfp mask w/o __GFP_WAIT or IO for the first round.  If
+	 * alloc failed with that and @pool was empty, retry immediately.
+	 */
+	if (gfp_temp != gfp_mask) {
+		gfp_temp = gfp_mask;
+		spin_unlock_irqrestore(&pool->lock, flags);
+		goto repeat_alloc;
+	}
+
+	/* We must not sleep if !__GFP_WAIT */
 	if (!(gfp_mask & __GFP_WAIT)) {
 		spin_unlock_irqrestore(&pool->lock, flags);
 		return NULL;
 	}

 	/* Let's wait for someone else to return an element to @pool */
-	gfp_temp = gfp_mask;
 	init_wait(&wait);
 	prepare_to_wait(&pool->wait, &wait, TASK_UNINTERRUPTIBLE);

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH UPDATED 2/2] mempool: fix first round failure behavior
  2011-12-22  0:46   ` [PATCH UPDATED " Tejun Heo
@ 2011-12-22  1:09     ` Andrew Morton
  2011-12-22  1:23       ` Tejun Heo
  0 siblings, 1 reply; 17+ messages in thread
From: Andrew Morton @ 2011-12-22  1:09 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Linus Torvalds, linux-kernel

On Wed, 21 Dec 2011 16:46:29 -0800
Tejun Heo <tj@kernel.org> wrote:

> mempool modifies gfp_mask so that the backing allocator doesn't try
> too hard or trigger warning message when there's pool to fall back on.
> In addition, for the first try, it removes __GFP_WAIT and IO, so that
> it doesn't trigger reclaim or wait when allocation can be fulfilled
> from pool; however, when that allocation fails and pool is empty too,
> it waits for the pool to be replenished before retrying.
> 
> Allocation which could have succeeded after a bit of reclaim has to
> wait on the reserved items and it's not like mempool doesn't retry
> with __GFP_WAIT and IO.  It just does that *after* someone returns an
> element, pointlessly delaying things.
> 
> Fix it by retrying immediately if the first round of allocation
> attempts w/o __GFP_WAIT and IO fails.

If the pool is empty and the memory allocator is down into its
emergency reserves then we have:

Old behaviour: Wait for someone to return an item, then retry

New behaviour: enable page reclaim in gfp_mask, retry a
               single time then wait for someone to return an item.

So what we can expect to see is that in this low-memory situation,
mempool_alloc() will perform a lot more page reclaim, and more mempool
items will be let loose into the kernel.

I'm not sure what the effects of this will be.  I can't immediately
point at any bad ones.  Probably not much, as the mempool_alloc()
caller will probably be doing other allocations, using the
reclaim-permitting gfp_mask.

But I have painful memories of us (me and Jens, iirc) churning this
code over and over again until it stopped causing problems.  Some were
subtle and nasty.  Much dumpster diving into the pre-git changelogs
should be done before changing it, lest we rediscover long-fixed
problems :(

I have a feeling that if we go for "New behaviour" then the
implementation could be made simpler than this, but laziness
is setting in.

> That said, I still find it a bit unsettling that a GFP_ATOMIC
> allocation which would otherwise succeed may fail when issued through
> mempool.

Spose so.  It would be strange to call mempool_alloc() with GFP_ATOMIC.
Because "wait for an item to be returned" is the whole point of the
thing.

> Maybe the RTTD is clearing __GFP_NOMEMALLOC on retry if the
> gfp requsted by the caller is !__GFP_WAIT && !__GFP_NOMEMALLOC?
> 

What the heck is an RTTD?

> +	/*
> +	 * We use gfp mask w/o __GFP_WAIT or IO for the first round.  If
> +	 * alloc failed with that and @pool was empty, retry immediately.
> +	 */
> +	if (gfp_temp != gfp_mask) {
> +		gfp_temp = gfp_mask;
> +		spin_unlock_irqrestore(&pool->lock, flags);
> +		goto repeat_alloc;
> +	}
> +

Here, have a faster kernel ;)

--- a/mm/mempool.c~mempool-fix-first-round-failure-behavior-fix
+++ a/mm/mempool.c
@@ -227,8 +227,8 @@ repeat_alloc:
 	 * original gfp mask.
 	 */
 	if (gfp_temp != gfp_mask) {
-		gfp_temp = gfp_mask;
 		spin_unlock_irqrestore(&pool->lock, flags);
+		gfp_temp = gfp_mask;
 		goto repeat_alloc;
 	}

_

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH UPDATED 2/2] mempool: fix first round failure behavior
  2011-12-22  1:09     ` Andrew Morton
@ 2011-12-22  1:23       ` Tejun Heo
  2011-12-22  1:31         ` Tejun Heo
  2011-12-22 15:15         ` Vivek Goyal
  0 siblings, 2 replies; 17+ messages in thread
From: Tejun Heo @ 2011-12-22  1:23 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Linus Torvalds, linux-kernel

Hello, Andrew.

On Wed, Dec 21, 2011 at 05:09:19PM -0800, Andrew Morton wrote:
> If the pool is empty and the memory allocator is down into its
> emergency reserves then we have:
> 
> Old behaviour: Wait for someone to return an item, then retry
> 
> New behaviour: enable page reclaim in gfp_mask, retry a
>                single time then wait for someone to return an item.
> 
> So what we can expect to see is that in this low-memory situation,
> mempool_alloc() will perform a lot more page reclaim, and more mempool
> items will be let loose into the kernel.
> 
> I'm not sure what the effects of this will be.  I can't immediately
> point at any bad ones.  Probably not much, as the mempool_alloc()
> caller will probably be doing other allocations, using the
> reclaim-permitting gfp_mask.
> 
> But I have painful memories of us (me and Jens, iirc) churning this
> code over and over again until it stopped causing problems.  Some were
> subtle and nasty.  Much dumpster diving into the pre-git changelogs
> should be done before changing it, lest we rediscover long-fixed
> problems :(

I see.  It just seemed like a weird behavior and looking at the commit
log, there was originally code to kick reclaim there, so the sequence
made sense - first try w/o reclaim, look at the mempool, kick reclaim
and retry w/ GFP_WAIT and then wait for someone else to free.  That
part was removed by 20a77776c24 "[PATCH] mempool: simplify alloc" back
in 05.  In the process, it also lost retry w/ reclaim before waiting
for mempool reserves.

I was trying to add percpu mempool and this bit me as percpu allocator
can't to NOIO and the above delayed retry logic ended up adding random
5s delay (or until the next free).

> > That said, I still find it a bit unsettling that a GFP_ATOMIC
> > allocation which would otherwise succeed may fail when issued through
> > mempool.
> 
> Spose so.  It would be strange to call mempool_alloc() with GFP_ATOMIC.
> Because "wait for an item to be returned" is the whole point of the
> thing.

Yeah but the pool can be used from multiple code paths and I think it
plausible to use it that way and expect at least the same or better
alloc behavior as not using mempool.  Eh... this doesn't really affect
correctness, so not such a big deal but still weird.

> > Maybe the RTTD is clearing __GFP_NOMEMALLOC on retry if the
> > gfp requsted by the caller is !__GFP_WAIT && !__GFP_NOMEMALLOC?
> 
> What the heck is an RTTD?

Right thing to do?  Hmmm... I thought other people were using it too.
It's quite possible that I just dreamed it up tho.

> > +	/*
> > +	 * We use gfp mask w/o __GFP_WAIT or IO for the first round.  If
> > +	 * alloc failed with that and @pool was empty, retry immediately.
> > +	 */
> > +	if (gfp_temp != gfp_mask) {
> > +		gfp_temp = gfp_mask;
> > +		spin_unlock_irqrestore(&pool->lock, flags);
> > +		goto repeat_alloc;
> > +	}
> > +
> 
> Here, have a faster kernel ;)

;)

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH UPDATED 2/2] mempool: fix first round failure behavior
  2011-12-22  1:23       ` Tejun Heo
@ 2011-12-22  1:31         ` Tejun Heo
  2011-12-22 15:15         ` Vivek Goyal
  1 sibling, 0 replies; 17+ messages in thread
From: Tejun Heo @ 2011-12-22  1:31 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Linus Torvalds, linux-kernel

On Wed, Dec 21, 2011 at 05:23:12PM -0800, Tejun Heo wrote:
> I see.  It just seemed like a weird behavior and looking at the commit
> log, there was originally code to kick reclaim there, so the sequence
> made sense - first try w/o reclaim, look at the mempool, kick reclaim
> and retry w/ GFP_WAIT and then wait for someone else to free.  That
> part was removed by 20a77776c24 "[PATCH] mempool: simplify alloc" back
> in 05.  In the process, it also lost retry w/ reclaim before waiting
> for mempool reserves.

Oops, it isn't that relevant but the original code retried w/ reclaim
before looking at mempool if less than half were left in the pool.

-- 
tejun

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH UPDATED 2/2] mempool: fix first round failure behavior
  2011-12-22  1:23       ` Tejun Heo
  2011-12-22  1:31         ` Tejun Heo
@ 2011-12-22 15:15         ` Vivek Goyal
  2011-12-22 15:20           ` Vivek Goyal
  2011-12-22 15:21           ` Tejun Heo
  1 sibling, 2 replies; 17+ messages in thread
From: Vivek Goyal @ 2011-12-22 15:15 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Andrew Morton, Linus Torvalds, linux-kernel

On Wed, Dec 21, 2011 at 05:23:12PM -0800, Tejun Heo wrote:
> Hello, Andrew.
> 
> On Wed, Dec 21, 2011 at 05:09:19PM -0800, Andrew Morton wrote:
> > If the pool is empty and the memory allocator is down into its
> > emergency reserves then we have:
> > 
> > Old behaviour: Wait for someone to return an item, then retry
> > 
> > New behaviour: enable page reclaim in gfp_mask, retry a
> >                single time then wait for someone to return an item.
> > 
> > So what we can expect to see is that in this low-memory situation,
> > mempool_alloc() will perform a lot more page reclaim, and more mempool
> > items will be let loose into the kernel.
> > 
> > I'm not sure what the effects of this will be.  I can't immediately
> > point at any bad ones.  Probably not much, as the mempool_alloc()
> > caller will probably be doing other allocations, using the
> > reclaim-permitting gfp_mask.
> > 
> > But I have painful memories of us (me and Jens, iirc) churning this
> > code over and over again until it stopped causing problems.  Some were
> > subtle and nasty.  Much dumpster diving into the pre-git changelogs
> > should be done before changing it, lest we rediscover long-fixed
> > problems :(
> 
> I see.  It just seemed like a weird behavior and looking at the commit
> log, there was originally code to kick reclaim there, so the sequence
> made sense - first try w/o reclaim, look at the mempool, kick reclaim
> and retry w/ GFP_WAIT and then wait for someone else to free.  That
> part was removed by 20a77776c24 "[PATCH] mempool: simplify alloc" back
> in 05.  In the process, it also lost retry w/ reclaim before waiting
> for mempool reserves.
> 
> I was trying to add percpu mempool and this bit me as percpu allocator
> can't to NOIO and the above delayed retry logic ended up adding random
> 5s delay (or until the next free).
> 

Tejun,

For blkio to use these percpu mempool, I guess we shall have to change this
notion that an allocated object from mempool is returned to the pool soon.
We might not be returning these per cpu stat objects to mempool for a very
long time (These will be returned to pool only when somebody deletes the
cgroup).

Thanks
Vivek

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH UPDATED 2/2] mempool: fix first round failure behavior
  2011-12-22 15:15         ` Vivek Goyal
@ 2011-12-22 15:20           ` Vivek Goyal
  2011-12-22 15:58             ` Tejun Heo
  2011-12-22 15:21           ` Tejun Heo
  1 sibling, 1 reply; 17+ messages in thread
From: Vivek Goyal @ 2011-12-22 15:20 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Andrew Morton, Linus Torvalds, linux-kernel

On Thu, Dec 22, 2011 at 10:15:29AM -0500, Vivek Goyal wrote:
> On Wed, Dec 21, 2011 at 05:23:12PM -0800, Tejun Heo wrote:
> > Hello, Andrew.
> > 
> > On Wed, Dec 21, 2011 at 05:09:19PM -0800, Andrew Morton wrote:
> > > If the pool is empty and the memory allocator is down into its
> > > emergency reserves then we have:
> > > 
> > > Old behaviour: Wait for someone to return an item, then retry
> > > 
> > > New behaviour: enable page reclaim in gfp_mask, retry a
> > >                single time then wait for someone to return an item.
> > > 
> > > So what we can expect to see is that in this low-memory situation,
> > > mempool_alloc() will perform a lot more page reclaim, and more mempool
> > > items will be let loose into the kernel.
> > > 
> > > I'm not sure what the effects of this will be.  I can't immediately
> > > point at any bad ones.  Probably not much, as the mempool_alloc()
> > > caller will probably be doing other allocations, using the
> > > reclaim-permitting gfp_mask.
> > > 
> > > But I have painful memories of us (me and Jens, iirc) churning this
> > > code over and over again until it stopped causing problems.  Some were
> > > subtle and nasty.  Much dumpster diving into the pre-git changelogs
> > > should be done before changing it, lest we rediscover long-fixed
> > > problems :(
> > 
> > I see.  It just seemed like a weird behavior and looking at the commit
> > log, there was originally code to kick reclaim there, so the sequence
> > made sense - first try w/o reclaim, look at the mempool, kick reclaim
> > and retry w/ GFP_WAIT and then wait for someone else to free.  That
> > part was removed by 20a77776c24 "[PATCH] mempool: simplify alloc" back
> > in 05.  In the process, it also lost retry w/ reclaim before waiting
> > for mempool reserves.
> > 
> > I was trying to add percpu mempool and this bit me as percpu allocator
> > can't to NOIO and the above delayed retry logic ended up adding random
> > 5s delay (or until the next free).
> > 
> 
> Tejun,
> 
> For blkio to use these percpu mempool, I guess we shall have to change this
> notion that an allocated object from mempool is returned to the pool soon.
> We might not be returning these per cpu stat objects to mempool for a very
> long time (These will be returned to pool only when somebody deletes the
> cgroup).

Hi Tejun,

Also mempool expects all the objects to be returned to the pool at the
time of destroy. Given the fact that blkio groups are reference counted,
theoritically they can be freed later.

I was playing a bit with maintaining a pool of blkio group objects. I
ran into the issue of blkio group not having any pointer to request
queue or cfqd as that is required to retrieve the pool pointer to which
this object has to go. Stashing pointer in blkio group then brings back
the issue of reference on queue or cfqd etc. So that makes it no simpler
then my current implementation. So I gave up.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH UPDATED 2/2] mempool: fix first round failure behavior
  2011-12-22 15:15         ` Vivek Goyal
  2011-12-22 15:20           ` Vivek Goyal
@ 2011-12-22 15:21           ` Tejun Heo
  1 sibling, 0 replies; 17+ messages in thread
From: Tejun Heo @ 2011-12-22 15:21 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: Andrew Morton, Linus Torvalds, linux-kernel

Hello,

On Thu, Dec 22, 2011 at 7:15 AM, Vivek Goyal <vgoyal@redhat.com> wrote:
> For blkio to use these percpu mempool, I guess we shall have to change this
> notion that an allocated object from mempool is returned to the pool soon.
> We might not be returning these per cpu stat objects to mempool for a very
> long time (These will be returned to pool only when somebody deletes the
> cgroup).

I'm almost done with it and it can behave as allocation buffer which
is filled from more permissible context and consumed in more
constricted one and async filling kicks the waiting allocator so I
think we should be fine.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH UPDATED 2/2] mempool: fix first round failure behavior
  2011-12-22 15:20           ` Vivek Goyal
@ 2011-12-22 15:58             ` Tejun Heo
  2011-12-22 16:04               ` Vivek Goyal
  0 siblings, 1 reply; 17+ messages in thread
From: Tejun Heo @ 2011-12-22 15:58 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: Andrew Morton, Linus Torvalds, linux-kernel

Hello,

On Thu, Dec 22, 2011 at 7:20 AM, Vivek Goyal <vgoyal@redhat.com> wrote:
> Also mempool expects all the objects to be returned to the pool at the
> time of destroy. Given the fact that blkio groups are reference counted,
> theoritically they can be freed later.

I removed that part but no you still can't free objects after mempool
destruction refcnted or not. mempool is destructed on module unload.
Which text would free those objects? If that's theoretically possible,
the code is broken.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH UPDATED 2/2] mempool: fix first round failure behavior
  2011-12-22 15:58             ` Tejun Heo
@ 2011-12-22 16:04               ` Vivek Goyal
  2011-12-22 16:15                 ` Tejun Heo
  0 siblings, 1 reply; 17+ messages in thread
From: Vivek Goyal @ 2011-12-22 16:04 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Andrew Morton, Linus Torvalds, linux-kernel

On Thu, Dec 22, 2011 at 07:58:18AM -0800, Tejun Heo wrote:
> Hello,
> 
> On Thu, Dec 22, 2011 at 7:20 AM, Vivek Goyal <vgoyal@redhat.com> wrote:
> > Also mempool expects all the objects to be returned to the pool at the
> > time of destroy. Given the fact that blkio groups are reference counted,
> > theoritically they can be freed later.
> 
> I removed that part but no you still can't free objects after mempool
> destruction refcnted or not. mempool is destructed on module unload.
> Which text would free those objects? If that's theoretically possible,
> the code is broken.

That's a good point. I did not think about module unload. I think then
my current patch for allocating per cpu object from worker thread is buggy
as on IO scheduler exit I don't try to flush the possibly in progress
work[s]. 

Thanks
Vivek

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH UPDATED 2/2] mempool: fix first round failure behavior
  2011-12-22 16:04               ` Vivek Goyal
@ 2011-12-22 16:15                 ` Tejun Heo
  0 siblings, 0 replies; 17+ messages in thread
From: Tejun Heo @ 2011-12-22 16:15 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: Andrew Morton, Linus Torvalds, linux-kernel

Hello, Vivek.

On Thu, Dec 22, 2011 at 11:04:23AM -0500, Vivek Goyal wrote:
> That's a good point. I did not think about module unload. I think then
> my current patch for allocating per cpu object from worker thread is buggy
> as on IO scheduler exit I don't try to flush the possibly in progress
> work[s]. 

Hmmm... I'm not sure whether backporting the mempool stuff would be a
viable approach for this merge window and -stable and we might have to
come up with some kind of minimal solution.  Don't yet know how that
should look like but let's talk about that in a different thread.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2011-12-22 16:15 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-12-22  0:18 [PATCH 1/2] mempool: drop unnecessary and incorrect BUG_ON() from mempool_destroy() Tejun Heo
2011-12-22  0:19 ` [PATCH 2/2] mempool: fix first round failure behavior Tejun Heo
2011-12-22  0:32   ` Andrew Morton
2011-12-22  0:34     ` Tejun Heo
2011-12-22  0:46   ` [PATCH UPDATED " Tejun Heo
2011-12-22  1:09     ` Andrew Morton
2011-12-22  1:23       ` Tejun Heo
2011-12-22  1:31         ` Tejun Heo
2011-12-22 15:15         ` Vivek Goyal
2011-12-22 15:20           ` Vivek Goyal
2011-12-22 15:58             ` Tejun Heo
2011-12-22 16:04               ` Vivek Goyal
2011-12-22 16:15                 ` Tejun Heo
2011-12-22 15:21           ` Tejun Heo
2011-12-22  0:25 ` [PATCH 1/2] mempool: drop unnecessary and incorrect BUG_ON() from mempool_destroy() Andrew Morton
2011-12-22  0:35   ` Tejun Heo
2011-12-22  0:40     ` Greg KH

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.