linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC] lib: Make radix_tree_node_alloc() irq safe
@ 2013-07-16 17:06 Jan Kara
  2013-07-17 20:14 ` Jens Axboe
  2013-07-17 23:12 ` Andrew Morton
  0 siblings, 2 replies; 11+ messages in thread
From: Jan Kara @ 2013-07-16 17:06 UTC (permalink / raw)
  To: Andrew Morton; +Cc: LKML, linux-mm, Jens Axboe, Jan Kara

With users of radix_tree_preload() run from interrupt (CFQ is one such
possible user), the following race can happen:

radix_tree_preload()
...
radix_tree_insert()
  radix_tree_node_alloc()
    if (rtp->nr) {
      ret = rtp->nodes[rtp->nr - 1];
<interrupt>
...
radix_tree_preload()
...
radix_tree_insert()
  radix_tree_node_alloc()
    if (rtp->nr) {
      ret = rtp->nodes[rtp->nr - 1];

And we give out one radix tree node twice. That clearly results in radix
tree corruption with different results (usually OOPS) depending on which
two users of radix tree race.

Fix the problem by disabling interrupts when working with rtp variable.
In-interrupt user can still deplete our preloaded nodes but at least we
won't corrupt radix trees.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 lib/radix-tree.c | 19 ++++++++++++++++---
 1 file changed, 16 insertions(+), 3 deletions(-)

  There are some questions regarding this patch:
Do we really want to allow in-interrupt users of radix_tree_preload()?  CFQ
could certainly do this in older kernels but that particular call site where I
saw the bug hit isn't there anymore so I'm not sure this can really happen with
recent kernels.

Also it is actually harmful to do preloading if you are in interrupt context
anyway. The disadvantage of disallowing radix_tree_preload() in interrupt is
that we would need to tweak radix_tree_node_alloc() to somehow recognize
whether the caller wants it to use preloaded nodes or not and that callers
would have to get it right (although maybe some magic in radix_tree_preload()
could handle that).

Opinions?

diff --git a/lib/radix-tree.c b/lib/radix-tree.c
index e796429..6f1045d 100644
--- a/lib/radix-tree.c
+++ b/lib/radix-tree.c
@@ -209,18 +209,26 @@ radix_tree_node_alloc(struct radix_tree_root *root)
 
 	if (!(gfp_mask & __GFP_WAIT)) {
 		struct radix_tree_preload *rtp;
+		unsigned long flags;
 
 		/*
 		 * Provided the caller has preloaded here, we will always
 		 * succeed in getting a node here (and never reach
-		 * kmem_cache_alloc)
+		 * kmem_cache_alloc)... unless we race with interrupt also
+		 * consuming preloaded nodes.
 		 */
 		rtp = &__get_cpu_var(radix_tree_preloads);
+		/*
+		 * Disable interrupts to make sure radix_tree_node_alloc()
+		 * called from interrupt cannot return the same node as we do.
+		 */
+		local_irq_save(flags);
 		if (rtp->nr) {
 			ret = rtp->nodes[rtp->nr - 1];
 			rtp->nodes[rtp->nr - 1] = NULL;
 			rtp->nr--;
 		}
+		local_irq_restore(flags);
 	}
 	if (ret == NULL)
 		ret = kmem_cache_alloc(radix_tree_node_cachep, gfp_mask);
@@ -269,6 +277,7 @@ int radix_tree_preload(gfp_t gfp_mask)
 	struct radix_tree_preload *rtp;
 	struct radix_tree_node *node;
 	int ret = -ENOMEM;
+	unsigned long flags;
 
 	preempt_disable();
 	rtp = &__get_cpu_var(radix_tree_preloads);
@@ -278,11 +287,15 @@ int radix_tree_preload(gfp_t gfp_mask)
 		if (node == NULL)
 			goto out;
 		preempt_disable();
+		local_irq_save(flags);
 		rtp = &__get_cpu_var(radix_tree_preloads);
-		if (rtp->nr < ARRAY_SIZE(rtp->nodes))
+		if (rtp->nr < ARRAY_SIZE(rtp->nodes)) {
 			rtp->nodes[rtp->nr++] = node;
-		else
+			local_irq_restore(flags);
+		} else {
+			local_irq_restore(flags);
 			kmem_cache_free(radix_tree_node_cachep, node);
+		}
 	}
 	ret = 0;
 out:
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH RFC] lib: Make radix_tree_node_alloc() irq safe
  2013-07-16 17:06 [PATCH RFC] lib: Make radix_tree_node_alloc() irq safe Jan Kara
@ 2013-07-17 20:14 ` Jens Axboe
  2013-07-17 23:12 ` Andrew Morton
  1 sibling, 0 replies; 11+ messages in thread
From: Jens Axboe @ 2013-07-17 20:14 UTC (permalink / raw)
  To: Jan Kara; +Cc: Andrew Morton, LKML, linux-mm

On Tue, Jul 16 2013, Jan Kara wrote:
> With users of radix_tree_preload() run from interrupt (CFQ is one such
> possible user), the following race can happen:
> 
> radix_tree_preload()
> ...
> radix_tree_insert()
>   radix_tree_node_alloc()
>     if (rtp->nr) {
>       ret = rtp->nodes[rtp->nr - 1];
> <interrupt>
> ...
> radix_tree_preload()
> ...
> radix_tree_insert()
>   radix_tree_node_alloc()
>     if (rtp->nr) {
>       ret = rtp->nodes[rtp->nr - 1];
> 
> And we give out one radix tree node twice. That clearly results in radix
> tree corruption with different results (usually OOPS) depending on which
> two users of radix tree race.
> 
> Fix the problem by disabling interrupts when working with rtp variable.
> In-interrupt user can still deplete our preloaded nodes but at least we
> won't corrupt radix trees.

Looks good to me, great catch Jan.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH RFC] lib: Make radix_tree_node_alloc() irq safe
  2013-07-16 17:06 [PATCH RFC] lib: Make radix_tree_node_alloc() irq safe Jan Kara
  2013-07-17 20:14 ` Jens Axboe
@ 2013-07-17 23:12 ` Andrew Morton
  2013-07-17 23:16   ` David Daney
                     ` (2 more replies)
  1 sibling, 3 replies; 11+ messages in thread
From: Andrew Morton @ 2013-07-17 23:12 UTC (permalink / raw)
  To: Jan Kara; +Cc: LKML, linux-mm, Jens Axboe

On Tue, 16 Jul 2013 19:06:30 +0200 Jan Kara <jack@suse.cz> wrote:

> With users of radix_tree_preload() run from interrupt (CFQ is one such
> possible user), the following race can happen:
> 
> radix_tree_preload()
> ...
> radix_tree_insert()
>   radix_tree_node_alloc()
>     if (rtp->nr) {
>       ret = rtp->nodes[rtp->nr - 1];
> <interrupt>
> ...
> radix_tree_preload()
> ...
> radix_tree_insert()
>   radix_tree_node_alloc()
>     if (rtp->nr) {
>       ret = rtp->nodes[rtp->nr - 1];
> 
> And we give out one radix tree node twice. That clearly results in radix
> tree corruption with different results (usually OOPS) depending on which
> two users of radix tree race.
> 
> Fix the problem by disabling interrupts when working with rtp variable.
> In-interrupt user can still deplete our preloaded nodes but at least we
> won't corrupt radix trees.
> 
> ...
>
>   There are some questions regarding this patch:
> Do we really want to allow in-interrupt users of radix_tree_preload()?  CFQ
> could certainly do this in older kernels but that particular call site where I
> saw the bug hit isn't there anymore so I'm not sure this can really happen with
> recent kernels.

Well, it was never anticipated that interrupt-time code would run
radix_tree_preload().  The whole point in the preloading was to be able
to perform GFP_KERNEL allocations before entering the spinlocked region
which needs to allocate memory.

Doing all that from within an interrupt is daft, because the interrupt code
can't use GFP_KERNEL anyway.

> Also it is actually harmful to do preloading if you are in interrupt context
> anyway. The disadvantage of disallowing radix_tree_preload() in interrupt is
> that we would need to tweak radix_tree_node_alloc() to somehow recognize
> whether the caller wants it to use preloaded nodes or not and that callers
> would have to get it right (although maybe some magic in radix_tree_preload()
> could handle that).
> 
> Opinions?

BUG_ON(in_interrupt()) :)



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH RFC] lib: Make radix_tree_node_alloc() irq safe
  2013-07-17 23:12 ` Andrew Morton
@ 2013-07-17 23:16   ` David Daney
  2013-07-18 13:09   ` Jan Kara
  2013-07-18 21:25   ` Jens Axboe
  2 siblings, 0 replies; 11+ messages in thread
From: David Daney @ 2013-07-17 23:16 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Jan Kara, LKML, linux-mm, Jens Axboe

On 07/17/2013 04:12 PM, Andrew Morton wrote:
> On Tue, 16 Jul 2013 19:06:30 +0200 Jan Kara <jack@suse.cz> wrote:
>
>> With users of radix_tree_preload() run from interrupt (CFQ is one such
>> possible user), the following race can happen:
>>
>> radix_tree_preload()
>> ...
>> radix_tree_insert()
>>    radix_tree_node_alloc()
>>      if (rtp->nr) {
>>        ret = rtp->nodes[rtp->nr - 1];
>> <interrupt>
>> ...
>> radix_tree_preload()
>> ...
>> radix_tree_insert()
>>    radix_tree_node_alloc()
>>      if (rtp->nr) {
>>        ret = rtp->nodes[rtp->nr - 1];
>>
>> And we give out one radix tree node twice. That clearly results in radix
>> tree corruption with different results (usually OOPS) depending on which
>> two users of radix tree race.
>>
>> Fix the problem by disabling interrupts when working with rtp variable.
>> In-interrupt user can still deplete our preloaded nodes but at least we
>> won't corrupt radix trees.
>>
>> ...
>>
>>    There are some questions regarding this patch:
>> Do we really want to allow in-interrupt users of radix_tree_preload()?  CFQ
>> could certainly do this in older kernels but that particular call site where I
>> saw the bug hit isn't there anymore so I'm not sure this can really happen with
>> recent kernels.
>
> Well, it was never anticipated that interrupt-time code would run
> radix_tree_preload().  The whole point in the preloading was to be able
> to perform GFP_KERNEL allocations before entering the spinlocked region
> which needs to allocate memory.
>
> Doing all that from within an interrupt is daft, because the interrupt code
> can't use GFP_KERNEL anyway.
>
>> Also it is actually harmful to do preloading if you are in interrupt context
>> anyway. The disadvantage of disallowing radix_tree_preload() in interrupt is
>> that we would need to tweak radix_tree_node_alloc() to somehow recognize
>> whether the caller wants it to use preloaded nodes or not and that callers
>> would have to get it right (although maybe some magic in radix_tree_preload()
>> could handle that).
>>
>> Opinions?
>
> BUG_ON(in_interrupt()) :)

Is is really that severe?  How about...

WARN_ON() instead?

David Daney



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH RFC] lib: Make radix_tree_node_alloc() irq safe
  2013-07-17 23:12 ` Andrew Morton
  2013-07-17 23:16   ` David Daney
@ 2013-07-18 13:09   ` Jan Kara
  2013-07-18 21:30     ` Jens Axboe
  2013-07-18 21:37     ` Andrew Morton
  2013-07-18 21:25   ` Jens Axboe
  2 siblings, 2 replies; 11+ messages in thread
From: Jan Kara @ 2013-07-18 13:09 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Jan Kara, LKML, linux-mm, Jens Axboe

On Wed 17-07-13 16:12:00, Andrew Morton wrote:
> On Tue, 16 Jul 2013 19:06:30 +0200 Jan Kara <jack@suse.cz> wrote:
> 
> > With users of radix_tree_preload() run from interrupt (CFQ is one such
> > possible user), the following race can happen:
> > 
> > radix_tree_preload()
> > ...
> > radix_tree_insert()
> >   radix_tree_node_alloc()
> >     if (rtp->nr) {
> >       ret = rtp->nodes[rtp->nr - 1];
> > <interrupt>
> > ...
> > radix_tree_preload()
> > ...
> > radix_tree_insert()
> >   radix_tree_node_alloc()
> >     if (rtp->nr) {
> >       ret = rtp->nodes[rtp->nr - 1];
> > 
> > And we give out one radix tree node twice. That clearly results in radix
> > tree corruption with different results (usually OOPS) depending on which
> > two users of radix tree race.
> > 
> > Fix the problem by disabling interrupts when working with rtp variable.
> > In-interrupt user can still deplete our preloaded nodes but at least we
> > won't corrupt radix trees.
> > 
> > ...
> >
> >   There are some questions regarding this patch:
> > Do we really want to allow in-interrupt users of radix_tree_preload()?  CFQ
> > could certainly do this in older kernels but that particular call site where I
> > saw the bug hit isn't there anymore so I'm not sure this can really happen with
> > recent kernels.
> 
> Well, it was never anticipated that interrupt-time code would run
> radix_tree_preload().  The whole point in the preloading was to be able
> to perform GFP_KERNEL allocations before entering the spinlocked region
> which needs to allocate memory.
> 
> Doing all that from within an interrupt is daft, because the interrupt code
> can't use GFP_KERNEL anyway.
  Fully agreed here.

> > Also it is actually harmful to do preloading if you are in interrupt context
> > anyway. The disadvantage of disallowing radix_tree_preload() in interrupt is
> > that we would need to tweak radix_tree_node_alloc() to somehow recognize
> > whether the caller wants it to use preloaded nodes or not and that callers
> > would have to get it right (although maybe some magic in radix_tree_preload()
> > could handle that).
> > 
> > Opinions?
> 
> BUG_ON(in_interrupt()) :)
  Or maybe WARN_ON()... But it's not so easy :) Currently radix tree code
assumes that if gfp_mask doesn't have __GFP_WAIT set caller has performed
radix_tree_preload(). Clearly this will stop working for in-interrupt users
of radix tree. So how do we propagate the information from the caller of
radix_tree_insert() down to radix_tree_node_alloc() whether the preload has
been performed or not? Will we rely on in_interrupt() or use some special
gfp_mask bit?

Secondly, CFQ has this unpleasant property that some functions are
sometimes called from interrupt context and sometimes not. So these
functions would have to check in what context they are called and either
perform preload or not. That's doable but it's going to be a bit ugly and
has to match the check in radix_tree_node_alloc() whether preload should be
used or not. So leaving the checking to the users of radix tree looks
fragile.  So maybe we could just silently exit from radix_tree_preload()
when we are in_interrupt()?

								Honza
  
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH RFC] lib: Make radix_tree_node_alloc() irq safe
  2013-07-17 23:12 ` Andrew Morton
  2013-07-17 23:16   ` David Daney
  2013-07-18 13:09   ` Jan Kara
@ 2013-07-18 21:25   ` Jens Axboe
  2 siblings, 0 replies; 11+ messages in thread
From: Jens Axboe @ 2013-07-18 21:25 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Jan Kara, LKML, linux-mm

On 07/17/2013 05:12 PM, Andrew Morton wrote:
> On Tue, 16 Jul 2013 19:06:30 +0200 Jan Kara <jack@suse.cz> wrote:
> 
>> With users of radix_tree_preload() run from interrupt (CFQ is one such
>> possible user), the following race can happen:
>>
>> radix_tree_preload()
>> ...
>> radix_tree_insert()
>>   radix_tree_node_alloc()
>>     if (rtp->nr) {
>>       ret = rtp->nodes[rtp->nr - 1];
>> <interrupt>
>> ...
>> radix_tree_preload()
>> ...
>> radix_tree_insert()
>>   radix_tree_node_alloc()
>>     if (rtp->nr) {
>>       ret = rtp->nodes[rtp->nr - 1];
>>
>> And we give out one radix tree node twice. That clearly results in radix
>> tree corruption with different results (usually OOPS) depending on which
>> two users of radix tree race.
>>
>> Fix the problem by disabling interrupts when working with rtp variable.
>> In-interrupt user can still deplete our preloaded nodes but at least we
>> won't corrupt radix trees.
>>
>> ...
>>
>>   There are some questions regarding this patch:
>> Do we really want to allow in-interrupt users of radix_tree_preload()?  CFQ
>> could certainly do this in older kernels but that particular call site where I
>> saw the bug hit isn't there anymore so I'm not sure this can really happen with
>> recent kernels.
> 
> Well, it was never anticipated that interrupt-time code would run
> radix_tree_preload().  The whole point in the preloading was to be able
> to perform GFP_KERNEL allocations before entering the spinlocked region
> which needs to allocate memory.
> 
> Doing all that from within an interrupt is daft, because the interrupt code
> can't use GFP_KERNEL anyway.
> 
>> Also it is actually harmful to do preloading if you are in interrupt context
>> anyway. The disadvantage of disallowing radix_tree_preload() in interrupt is
>> that we would need to tweak radix_tree_node_alloc() to somehow recognize
>> whether the caller wants it to use preloaded nodes or not and that callers
>> would have to get it right (although maybe some magic in radix_tree_preload()
>> could handle that).
>>
>> Opinions?
> 
> BUG_ON(in_interrupt()) :)

Good point Andrew, it'd be better to "document" the restriction (since
the use is non-sensical). It's actually not CFQ code that does this,
it's the io context management.

Excuse the crappy mailer, but something ala:

diff --git a/block/blk-ioc.c b/block/blk-ioc.c
index 9c4bb82..bcb9b17 100644
--- a/block/blk-ioc.c
+++ b/block/blk-ioc.c
@@ -366,7 +366,7 @@ struct io_cq *ioc_create_icq(struct io_context *ioc,
struct
        if (!icq)
                return NULL;

-       if (radix_tree_preload(gfp_mask) < 0) {
+       if ((gfp_mask & __GFP_WAIT) && radix_tree_preload(gfp_mask) < 0) {
                kmem_cache_free(et->icq_cache, icq);
                return NULL;
        }
@@ -394,7 +394,10 @@ struct io_cq *ioc_create_icq(struct io_context
*ioc, struct

        spin_unlock(&ioc->lock);
        spin_unlock_irq(q->queue_lock);
-       radix_tree_preload_end();
+
+       if (gfp_mask & __GFP_WAIT)
+               radix_tree_preload_end();
+
        return icq;
 }



-- 
Jens Axboe


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH RFC] lib: Make radix_tree_node_alloc() irq safe
  2013-07-18 13:09   ` Jan Kara
@ 2013-07-18 21:30     ` Jens Axboe
  2013-07-22 15:21       ` Jan Kara
  2013-07-18 21:37     ` Andrew Morton
  1 sibling, 1 reply; 11+ messages in thread
From: Jens Axboe @ 2013-07-18 21:30 UTC (permalink / raw)
  To: Jan Kara; +Cc: Andrew Morton, LKML, linux-mm

On 07/18/2013 07:09 AM, Jan Kara wrote:
> On Wed 17-07-13 16:12:00, Andrew Morton wrote:
>> On Tue, 16 Jul 2013 19:06:30 +0200 Jan Kara <jack@suse.cz> wrote:
>>
>>> With users of radix_tree_preload() run from interrupt (CFQ is one such
>>> possible user), the following race can happen:
>>>
>>> radix_tree_preload()
>>> ...
>>> radix_tree_insert()
>>>   radix_tree_node_alloc()
>>>     if (rtp->nr) {
>>>       ret = rtp->nodes[rtp->nr - 1];
>>> <interrupt>
>>> ...
>>> radix_tree_preload()
>>> ...
>>> radix_tree_insert()
>>>   radix_tree_node_alloc()
>>>     if (rtp->nr) {
>>>       ret = rtp->nodes[rtp->nr - 1];
>>>
>>> And we give out one radix tree node twice. That clearly results in radix
>>> tree corruption with different results (usually OOPS) depending on which
>>> two users of radix tree race.
>>>
>>> Fix the problem by disabling interrupts when working with rtp variable.
>>> In-interrupt user can still deplete our preloaded nodes but at least we
>>> won't corrupt radix trees.
>>>
>>> ...
>>>
>>>   There are some questions regarding this patch:
>>> Do we really want to allow in-interrupt users of radix_tree_preload()?  CFQ
>>> could certainly do this in older kernels but that particular call site where I
>>> saw the bug hit isn't there anymore so I'm not sure this can really happen with
>>> recent kernels.
>>
>> Well, it was never anticipated that interrupt-time code would run
>> radix_tree_preload().  The whole point in the preloading was to be able
>> to perform GFP_KERNEL allocations before entering the spinlocked region
>> which needs to allocate memory.
>>
>> Doing all that from within an interrupt is daft, because the interrupt code
>> can't use GFP_KERNEL anyway.
>   Fully agreed here.
> 
>>> Also it is actually harmful to do preloading if you are in interrupt context
>>> anyway. The disadvantage of disallowing radix_tree_preload() in interrupt is
>>> that we would need to tweak radix_tree_node_alloc() to somehow recognize
>>> whether the caller wants it to use preloaded nodes or not and that callers
>>> would have to get it right (although maybe some magic in radix_tree_preload()
>>> could handle that).
>>>
>>> Opinions?
>>
>> BUG_ON(in_interrupt()) :)
>   Or maybe WARN_ON()... But it's not so easy :) Currently radix tree code
> assumes that if gfp_mask doesn't have __GFP_WAIT set caller has performed
> radix_tree_preload(). Clearly this will stop working for in-interrupt users
> of radix tree. So how do we propagate the information from the caller of
> radix_tree_insert() down to radix_tree_node_alloc() whether the preload has
> been performed or not? Will we rely on in_interrupt() or use some special
> gfp_mask bit?

Should have read the full thread... in_interrupt() is ugly to base
decisions on, imho. I'd say just use __GFP_WAIT to signal this.

> Secondly, CFQ has this unpleasant property that some functions are
> sometimes called from interrupt context and sometimes not. So these
> functions would have to check in what context they are called and either
> perform preload or not. That's doable but it's going to be a bit ugly and
> has to match the check in radix_tree_node_alloc() whether preload should be
> used or not. So leaving the checking to the users of radix tree looks
> fragile.  So maybe we could just silently exit from radix_tree_preload()
> when we are in_interrupt()?

Which CFQ functions are these? Generally we get callbacks from the
drivers on both queue and complete times that can be done at various
contexts, so it's not something that is easily solvable. I'm assuming
you are referring to the blk-ioc.c functions here, though?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH RFC] lib: Make radix_tree_node_alloc() irq safe
  2013-07-18 13:09   ` Jan Kara
  2013-07-18 21:30     ` Jens Axboe
@ 2013-07-18 21:37     ` Andrew Morton
  2013-07-22 20:30       ` Jan Kara
  1 sibling, 1 reply; 11+ messages in thread
From: Andrew Morton @ 2013-07-18 21:37 UTC (permalink / raw)
  To: Jan Kara; +Cc: LKML, linux-mm, Jens Axboe

On Thu, 18 Jul 2013 15:09:32 +0200 Jan Kara <jack@suse.cz> wrote:

> On Wed 17-07-13 16:12:00, Andrew Morton wrote:
> > On Tue, 16 Jul 2013 19:06:30 +0200 Jan Kara <jack@suse.cz> wrote:
> > 
> > BUG_ON(in_interrupt()) :)
>   Or maybe WARN_ON()... But it's not so easy :) Currently radix tree code
> assumes that if gfp_mask doesn't have __GFP_WAIT set caller has performed
> radix_tree_preload(). Clearly this will stop working for in-interrupt users
> of radix tree. So how do we propagate the information from the caller of
> radix_tree_insert() down to radix_tree_node_alloc() whether the preload has
> been performed or not? Will we rely on in_interrupt() or use some special
> gfp_mask bit?

Well, it won't stop working.  The interrupt-time
radix_tree_node_alloc() call will try to grab a node from the cpu-local
magazine and if that failed, will call kmem_cache_alloc().  Presumably
the caller has passed in GFP_ATOMIC, so the kmem_cache_alloc() will use
page reserves, which seems appropriate.

This will mean that the interrupt-time node allocation will sometimes
steal a preloaded node from process-context code.  In the absolutely
worst case, the process-context code will then need to try
kmem_cache_alloc(), which will probably succeed anyway.

It's not perfect - we'd prefer that process-context node allocations
not get stolen in this fashion.  That's easily fixed with

--- a/lib/radix-tree.c~a
+++ a/lib/radix-tree.c
@@ -207,7 +207,10 @@ radix_tree_node_alloc(struct radix_tree_
 	struct radix_tree_node *ret = NULL;
 	gfp_t gfp_mask = root_gfp_mask(root);
 
-	if (!(gfp_mask & __GFP_WAIT)) {
+	/*
+	 * Lengthy comment goes here
+	 */
+	if (!(gfp_mask & __GFP_WAIT) && !in_interrupt()) {
 		struct radix_tree_preload *rtp;
 
 		/*

But I don't know if it's worth it.

> Secondly, CFQ has this unpleasant property that some functions are
> sometimes called from interrupt context and sometimes not. So these
> functions would have to check in what context they are called and either
> perform preload or not. That's doable but it's going to be a bit ugly and
> has to match the check in radix_tree_node_alloc() whether preload should be
> used or not. So leaving the checking to the users of radix tree looks
> fragile.

mm...  The CFQ code should be passing around a gfp_t anyway - GFP_NOIO
or GFP_ATOMIC, depending on the calling context.  So don't call
radix_tree_preload() if it's GFP_ATOMIC.

> So maybe we could just silently exit from radix_tree_preload()
> when we are in_interrupt()?

Or that.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH RFC] lib: Make radix_tree_node_alloc() irq safe
  2013-07-18 21:30     ` Jens Axboe
@ 2013-07-22 15:21       ` Jan Kara
  2013-07-22 15:38         ` Jens Axboe
  0 siblings, 1 reply; 11+ messages in thread
From: Jan Kara @ 2013-07-22 15:21 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Jan Kara, Andrew Morton, LKML, linux-mm

On Thu 18-07-13 15:30:27, Jens Axboe wrote:
> On 07/18/2013 07:09 AM, Jan Kara wrote:
> > On Wed 17-07-13 16:12:00, Andrew Morton wrote:
> >> On Tue, 16 Jul 2013 19:06:30 +0200 Jan Kara <jack@suse.cz> wrote:
> >>
> >>> With users of radix_tree_preload() run from interrupt (CFQ is one such
> >>> possible user), the following race can happen:
> >>>
> >>> radix_tree_preload()
> >>> ...
> >>> radix_tree_insert()
> >>>   radix_tree_node_alloc()
> >>>     if (rtp->nr) {
> >>>       ret = rtp->nodes[rtp->nr - 1];
> >>> <interrupt>
> >>> ...
> >>> radix_tree_preload()
> >>> ...
> >>> radix_tree_insert()
> >>>   radix_tree_node_alloc()
> >>>     if (rtp->nr) {
> >>>       ret = rtp->nodes[rtp->nr - 1];
> >>>
> >>> And we give out one radix tree node twice. That clearly results in radix
> >>> tree corruption with different results (usually OOPS) depending on which
> >>> two users of radix tree race.
> >>>
> >>> Fix the problem by disabling interrupts when working with rtp variable.
> >>> In-interrupt user can still deplete our preloaded nodes but at least we
> >>> won't corrupt radix trees.
> >>>
> >>> ...
> >>>
> >>>   There are some questions regarding this patch:
> >>> Do we really want to allow in-interrupt users of radix_tree_preload()?  CFQ
> >>> could certainly do this in older kernels but that particular call site where I
> >>> saw the bug hit isn't there anymore so I'm not sure this can really happen with
> >>> recent kernels.
> >>
> >> Well, it was never anticipated that interrupt-time code would run
> >> radix_tree_preload().  The whole point in the preloading was to be able
> >> to perform GFP_KERNEL allocations before entering the spinlocked region
> >> which needs to allocate memory.
> >>
> >> Doing all that from within an interrupt is daft, because the interrupt code
> >> can't use GFP_KERNEL anyway.
> >   Fully agreed here.
> > 
> >>> Also it is actually harmful to do preloading if you are in interrupt context
> >>> anyway. The disadvantage of disallowing radix_tree_preload() in interrupt is
> >>> that we would need to tweak radix_tree_node_alloc() to somehow recognize
> >>> whether the caller wants it to use preloaded nodes or not and that callers
> >>> would have to get it right (although maybe some magic in radix_tree_preload()
> >>> could handle that).
> >>>
> >>> Opinions?
> >>
> >> BUG_ON(in_interrupt()) :)
> >   Or maybe WARN_ON()... But it's not so easy :) Currently radix tree code
> > assumes that if gfp_mask doesn't have __GFP_WAIT set caller has performed
> > radix_tree_preload(). Clearly this will stop working for in-interrupt users
> > of radix tree. So how do we propagate the information from the caller of
> > radix_tree_insert() down to radix_tree_node_alloc() whether the preload has
> > been performed or not? Will we rely on in_interrupt() or use some special
> > gfp_mask bit?
> 
> Should have read the full thread... in_interrupt() is ugly to base
> decisions on, imho. I'd say just use __GFP_WAIT to signal this.
  Yeah, probably that would be nicer and doable in radix_tree_preload().
But in radix_tree_node_alloc() we get gfp_mask as set for the radix tree
and thus it's always going to be GFP_ATOMIC in case of ioc radix tree. So
there we have to have something different if we don't want in interrupt
users to use preallocated nodes. As Andrew suggests maybe that's not
necessary but then my original patch is what you end up with.

> > Secondly, CFQ has this unpleasant property that some functions are
> > sometimes called from interrupt context and sometimes not. So these
> > functions would have to check in what context they are called and either
> > perform preload or not. That's doable but it's going to be a bit ugly and
> > has to match the check in radix_tree_node_alloc() whether preload should be
> > used or not. So leaving the checking to the users of radix tree looks
> > fragile.  So maybe we could just silently exit from radix_tree_preload()
> > when we are in_interrupt()?
> 
> Which CFQ functions are these? Generally we get callbacks from the
> drivers on both queue and complete times that can be done at various
> contexts, so it's not something that is easily solvable. I'm assuming
> you are referring to the blk-ioc.c functions here, though?
  Yes, that's exactly what I'm referring to.

									Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH RFC] lib: Make radix_tree_node_alloc() irq safe
  2013-07-22 15:21       ` Jan Kara
@ 2013-07-22 15:38         ` Jens Axboe
  0 siblings, 0 replies; 11+ messages in thread
From: Jens Axboe @ 2013-07-22 15:38 UTC (permalink / raw)
  To: Jan Kara; +Cc: Andrew Morton, LKML, linux-mm

On Mon, Jul 22 2013, Jan Kara wrote:
> On Thu 18-07-13 15:30:27, Jens Axboe wrote:
> > On 07/18/2013 07:09 AM, Jan Kara wrote:
> > > On Wed 17-07-13 16:12:00, Andrew Morton wrote:
> > >> On Tue, 16 Jul 2013 19:06:30 +0200 Jan Kara <jack@suse.cz> wrote:
> > >>
> > >>> With users of radix_tree_preload() run from interrupt (CFQ is one such
> > >>> possible user), the following race can happen:
> > >>>
> > >>> radix_tree_preload()
> > >>> ...
> > >>> radix_tree_insert()
> > >>>   radix_tree_node_alloc()
> > >>>     if (rtp->nr) {
> > >>>       ret = rtp->nodes[rtp->nr - 1];
> > >>> <interrupt>
> > >>> ...
> > >>> radix_tree_preload()
> > >>> ...
> > >>> radix_tree_insert()
> > >>>   radix_tree_node_alloc()
> > >>>     if (rtp->nr) {
> > >>>       ret = rtp->nodes[rtp->nr - 1];
> > >>>
> > >>> And we give out one radix tree node twice. That clearly results in radix
> > >>> tree corruption with different results (usually OOPS) depending on which
> > >>> two users of radix tree race.
> > >>>
> > >>> Fix the problem by disabling interrupts when working with rtp variable.
> > >>> In-interrupt user can still deplete our preloaded nodes but at least we
> > >>> won't corrupt radix trees.
> > >>>
> > >>> ...
> > >>>
> > >>>   There are some questions regarding this patch:
> > >>> Do we really want to allow in-interrupt users of radix_tree_preload()?  CFQ
> > >>> could certainly do this in older kernels but that particular call site where I
> > >>> saw the bug hit isn't there anymore so I'm not sure this can really happen with
> > >>> recent kernels.
> > >>
> > >> Well, it was never anticipated that interrupt-time code would run
> > >> radix_tree_preload().  The whole point in the preloading was to be able
> > >> to perform GFP_KERNEL allocations before entering the spinlocked region
> > >> which needs to allocate memory.
> > >>
> > >> Doing all that from within an interrupt is daft, because the interrupt code
> > >> can't use GFP_KERNEL anyway.
> > >   Fully agreed here.
> > > 
> > >>> Also it is actually harmful to do preloading if you are in interrupt context
> > >>> anyway. The disadvantage of disallowing radix_tree_preload() in interrupt is
> > >>> that we would need to tweak radix_tree_node_alloc() to somehow recognize
> > >>> whether the caller wants it to use preloaded nodes or not and that callers
> > >>> would have to get it right (although maybe some magic in radix_tree_preload()
> > >>> could handle that).
> > >>>
> > >>> Opinions?
> > >>
> > >> BUG_ON(in_interrupt()) :)
> > >   Or maybe WARN_ON()... But it's not so easy :) Currently radix tree code
> > > assumes that if gfp_mask doesn't have __GFP_WAIT set caller has performed
> > > radix_tree_preload(). Clearly this will stop working for in-interrupt users
> > > of radix tree. So how do we propagate the information from the caller of
> > > radix_tree_insert() down to radix_tree_node_alloc() whether the preload has
> > > been performed or not? Will we rely on in_interrupt() or use some special
> > > gfp_mask bit?
> > 
> > Should have read the full thread... in_interrupt() is ugly to base
> > decisions on, imho. I'd say just use __GFP_WAIT to signal this.
>   Yeah, probably that would be nicer and doable in radix_tree_preload().
> But in radix_tree_node_alloc() we get gfp_mask as set for the radix tree
> and thus it's always going to be GFP_ATOMIC in case of ioc radix tree. So
> there we have to have something different if we don't want in interrupt
> users to use preallocated nodes. As Andrew suggests maybe that's not
> necessary but then my original patch is what you end up with.

At least for the ioc functions, failure can be tolerated (and is
handled). So for this specific case, we don't need to do anything futher
than just the GFP_ATOMIC set in the tree. If that fails, we just don't
link the ioc this time.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH RFC] lib: Make radix_tree_node_alloc() irq safe
  2013-07-18 21:37     ` Andrew Morton
@ 2013-07-22 20:30       ` Jan Kara
  0 siblings, 0 replies; 11+ messages in thread
From: Jan Kara @ 2013-07-22 20:30 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Jan Kara, LKML, linux-mm, Jens Axboe

On Thu 18-07-13 14:37:15, Andrew Morton wrote:
> On Thu, 18 Jul 2013 15:09:32 +0200 Jan Kara <jack@suse.cz> wrote:
> 
> > On Wed 17-07-13 16:12:00, Andrew Morton wrote:
> > > On Tue, 16 Jul 2013 19:06:30 +0200 Jan Kara <jack@suse.cz> wrote:
> > > 
> > > BUG_ON(in_interrupt()) :)
> >   Or maybe WARN_ON()... But it's not so easy :) Currently radix tree code
> > assumes that if gfp_mask doesn't have __GFP_WAIT set caller has performed
> > radix_tree_preload(). Clearly this will stop working for in-interrupt users
> > of radix tree. So how do we propagate the information from the caller of
> > radix_tree_insert() down to radix_tree_node_alloc() whether the preload has
> > been performed or not? Will we rely on in_interrupt() or use some special
> > gfp_mask bit?
> 
> Well, it won't stop working.  The interrupt-time
> radix_tree_node_alloc() call will try to grab a node from the cpu-local
> magazine and if that failed, will call kmem_cache_alloc().  Presumably
> the caller has passed in GFP_ATOMIC, so the kmem_cache_alloc() will use
> page reserves, which seems appropriate.
  True. But then we have to make radix_tree_node_alloc() irq safe as I did
in my patch.

> This will mean that the interrupt-time node allocation will sometimes
> steal a preloaded node from process-context code.  In the absolutely
> worst case, the process-context code will then need to try
> kmem_cache_alloc(), which will probably succeed anyway.
  What I found somewhat nasty about stealing of preallocated nodes by
interrupt is that process-context user can do preload which succeeds so he
assumes radix_tree_node_alloc() cannot fail. If interrupt steals nodes,
radix_tree_node_alloc() can return NULL for process-context user and that
may oops or whatever (I've checked and there are users where this would
currently happen - e.g. mm/vmalloc.c:new_vmap_block()). And it will be
quite hard to track down why that happened.

So the question is more about the status of radix_tree_preload() - do we
make it just an optimization (so that allocations have higher chance of
success / stress the system less) or do we want to guarantee the caller
that nodes are there to be used? In the first case we have to fix some
users of radix_tree_preload() in the second case we have to tweak
radix_tree_node_alloc() like you suggest below.

> It's not perfect - we'd prefer that process-context node allocations
> not get stolen in this fashion.  That's easily fixed with
> 
> --- a/lib/radix-tree.c~a
> +++ a/lib/radix-tree.c
> @@ -207,7 +207,10 @@ radix_tree_node_alloc(struct radix_tree_
>  	struct radix_tree_node *ret = NULL;
>  	gfp_t gfp_mask = root_gfp_mask(root);
>  
> -	if (!(gfp_mask & __GFP_WAIT)) {
> +	/*
> +	 * Lengthy comment goes here
> +	 */
> +	if (!(gfp_mask & __GFP_WAIT) && !in_interrupt()) {
>  		struct radix_tree_preload *rtp;
>  
>  		/*
> 
> But I don't know if it's worth it.
> 
> > Secondly, CFQ has this unpleasant property that some functions are
> > sometimes called from interrupt context and sometimes not. So these
> > functions would have to check in what context they are called and either
> > perform preload or not. That's doable but it's going to be a bit ugly and
> > has to match the check in radix_tree_node_alloc() whether preload should be
> > used or not. So leaving the checking to the users of radix tree looks
> > fragile.
> 
> mm...  The CFQ code should be passing around a gfp_t anyway - GFP_NOIO
> or GFP_ATOMIC, depending on the calling context.  So don't call
> radix_tree_preload() if it's GFP_ATOMIC.
  I've checked and there are other users which can end up calling
radix_tree_preload() with GFP_ATOMIC or similar gfp masks (e.g. some
add_to_swap_cache() users). So I think it would be better to handle that
case inside a common function. Maybe we could have a separate function like
radix_tree_try_preload() which would just skip any preloading if mask
doesn't have __GFP_WAIT set and radix_tree_preload() will complain if it is
passed a gfp mask without __GFP_WAIT. Hm?

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2013-07-22 20:30 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-07-16 17:06 [PATCH RFC] lib: Make radix_tree_node_alloc() irq safe Jan Kara
2013-07-17 20:14 ` Jens Axboe
2013-07-17 23:12 ` Andrew Morton
2013-07-17 23:16   ` David Daney
2013-07-18 13:09   ` Jan Kara
2013-07-18 21:30     ` Jens Axboe
2013-07-22 15:21       ` Jan Kara
2013-07-22 15:38         ` Jens Axboe
2013-07-18 21:37     ` Andrew Morton
2013-07-22 20:30       ` Jan Kara
2013-07-18 21:25   ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).