All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC 4/4] irq: allow a per-allocation upper limit when allocating irqs
       [not found] <generic-irq-changes@mdm.bga.com>
  2011-05-25  6:34 ` [PATCH 3/4] irq: remove unnecessary __ref on irq_alloc_descs Milton Miller
@ 2011-05-25  6:34 ` Milton Miller
  2011-05-25  7:55   ` Ingo Molnar
                     ` (2 more replies)
  2011-05-25  6:34 ` [PATCH 1/4] sparse irq: protect irq_to_desc against irq_free_descs Milton Miller
  2011-05-25  6:34 ` [PATCH 2/4] irq: radix_tree_insert can fail Milton Miller
  3 siblings, 3 replies; 12+ messages in thread
From: Milton Miller @ 2011-05-25  6:34 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: linux-kernel, Grant Likely

Allow the option to specify an upper limit to the irq numbers for
each allocation.  The limit is non-inclusive, and 0 means no limit
needed by caller.

Some irq chips can support a relative large and arbitrary range,
but not infinite.  For example, they may use the linux irq number
as the msi desciptor data, which for msi is 16 bits.

Since e7bcecb7b1 (genirq: Make nr_irqs runtime expandable), checking
NR_IRQS or even nr_irqs is not sufficient to enforce such requirements
when sparse irqs are configured.

Based on an irc discussion, make the limit per call instead of an
arch callback or global setting.

If the requested count is above the limit, return -ENOMEM as if
nr_irqs could not be expanded, assuming the limit was specified at
a different layer than the count.

This code does not try to keep the prior semantics of irq_alloc_descs
when irq was non-negative but not equal to from.  I believe it was an
implementation artifact instead of a designed feature that one could
specify search starting from 4 and fail if the allocated irq was not
exactly 6, confirming that the intervening irqs were reserved.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Milton Miller <miltonm@bga.com>


Thomas,

This is still RFC because I haven't gotten far enough in my series
to actually use the patch yet.  If you choose not to apply it now,
should I submit a patch to enforce irq matches from if its not -1
(or otherwise negative)?

Contrasting to Grant's earlier proposal, this patch:
(1) has a non-inclusive limit, sutiable for showing size of an array etc.
(2) has no redundant parameter for a specific irq to be allocated after inlines
(3) avoids searching areas that will be rejected by calculating the end,
    which also simpifies the search result check.

milton

Index: work.git/include/linux/irq.h
===================================================================
--- work.git.orig/include/linux/irq.h	2011-05-25 01:01:50.230468165 -0500
+++ work.git/include/linux/irq.h	2011-05-25 01:02:11.478479716 -0500
@@ -546,10 +546,20 @@ static inline struct msi_desc *irq_data_
 	return d->msi_desc;
 }
 
-int irq_alloc_descs(int irq, unsigned int from, unsigned int cnt, int node);
+int irq_alloc_descs_range(unsigned int from, unsigned int limit, unsigned int cnt, int node);
 void irq_free_descs(unsigned int irq, unsigned int cnt);
 int irq_reserve_irqs(unsigned int from, unsigned int cnt);
 
+int irq_alloc_descs(int irq, unsigned int from, unsigned int cnt, int node)
+{
+	if (irq < 0)
+		return irq_alloc_descs_range(from, 0, cnt, node);
+	/* fail if specified start at 4 and obtain 6 */
+	if (irq != from)
+		return -EEXIST;
+	return irq_alloc_descs_range(from, from + cnt, cnt, node);
+}
+
 static inline int irq_alloc_desc(int node)
 {
 	return irq_alloc_descs(-1, 0, 1, node);
Index: work.git/kernel/irq/irqdesc.c
===================================================================
--- work.git.orig/kernel/irq/irqdesc.c	2011-05-25 01:02:11.454480436 -0500
+++ work.git/kernel/irq/irqdesc.c	2011-05-25 01:04:03.441480315 -0500
@@ -343,27 +343,37 @@ void irq_free_descs(unsigned int from, u
 EXPORT_SYMBOL_GPL(irq_free_descs);
 
 /**
- * irq_alloc_descs - allocate and initialize a range of irq descriptors
- * @irq:	Allocate for specific irq number if irq >= 0
+ * irq_alloc_descs_range - allocate and initialize a range of irq descriptors
  * @from:	Start the search from this irq number
+ * @limit:	Unless zero, all irq numbers must be less than this value
  * @cnt:	Number of consecutive irqs to allocate.
  * @node:	Preferred node on which the irq descriptor should be allocated
  *
  * Returns the first irq number or error code
  */
-int irq_alloc_descs(int irq, unsigned int from, unsigned int cnt, int node)
+int irq_alloc_descs_range(unsigned int from, unsigned int limit,
+		unsigned int cnt, int node)
 {
-	int start, ret;
+	unsigned int start, end;
+	int ret;
 
 	if (!cnt)
 		return -EINVAL;
+	if (limit) {
+		if (cnt > limit)
+			return -ENOMEM;
+		if (from > limit - cnt)
+			return -EINVAL;
+		end = min_t(unsigned int, limit, IRQ_BITMAP_BITS);
+	} else {
+		end = IRQ_BITMAP_BITS;
+	}
 
 	mutex_lock(&sparse_irq_lock);
 
-	start = bitmap_find_next_zero_area(allocated_irqs, IRQ_BITMAP_BITS,
-					   from, cnt, 0);
+	start = bitmap_find_next_zero_area(allocated_irqs, end, from, cnt, 0);
 	ret = -EEXIST;
-	if (irq >=0 && start != irq)
+	if (start >= end)
 		goto err;
 
 	if (start + cnt > nr_irqs) {
@@ -380,7 +390,7 @@ err:
 	mutex_unlock(&sparse_irq_lock);
 	return ret;
 }
-EXPORT_SYMBOL_GPL(irq_alloc_descs);
+EXPORT_SYMBOL_GPL(irq_alloc_descs_range);
 
 /**
  * irq_reserve_irqs - mark irqs allocated

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 2/4] irq: radix_tree_insert can fail
       [not found] <generic-irq-changes@mdm.bga.com>
                   ` (2 preceding siblings ...)
  2011-05-25  6:34 ` [PATCH 1/4] sparse irq: protect irq_to_desc against irq_free_descs Milton Miller
@ 2011-05-25  6:34 ` Milton Miller
  2011-05-25  8:18   ` Thomas Gleixner
  3 siblings, 1 reply; 12+ messages in thread
From: Milton Miller @ 2011-05-25  6:34 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: linux-kernel

Check the insert, and if it fails cleanup and free all partial work.

Sparse irq was not checking the return code from radix_tree_insert,
but it may need to allocate memory and can fail.  If it failed,
it still claimed success to the caller but the affected irq(s) are
unavailable and the reference to the affected descriptors is leaked.

Signed-off-by: Milton Miller <miltonm@bga.com>
---
I started by tring to change free_desc to take the descriptor pointer
and pushing that down, but that soon ran into conflicts between the
array and sparse implementations, and/or the old dynamic irq cleanup
function that is still used by some architectures.  This version is
targeted, and also protects against scribbles to irq_data.irq.


Index: work.git/kernel/irq/irqdesc.c
===================================================================
--- work.git.orig/kernel/irq/irqdesc.c	2011-05-23 13:46:09.197635762 -0500
+++ work.git/kernel/irq/irqdesc.c	2011-05-23 14:29:22.960588100 -0500
@@ -164,10 +164,8 @@ err_desc:
 	return NULL;
 }
 
-static void free_desc(unsigned int irq)
+static void free_a_desc(unsigned int irq, struct irq_desc *desc)
 {
-	struct irq_desc *desc = irq_to_desc(irq);
-
 	unregister_irq_proc(irq, desc);
 
 	mutex_lock(&sparse_irq_lock);
@@ -179,21 +177,30 @@ static void free_desc(unsigned int irq)
 	kfree(desc);
 }
 
+static void free_desc(unsigned int irq)
+{
+	free_a_desc(irq, irq_to_desc(irq));
+}
+
 static int alloc_descs(unsigned int start, unsigned int cnt, int node)
 {
 	struct irq_desc *desc;
-	int i;
+	int i, res;
 
 	for (i = 0; i < cnt; i++) {
 		desc = alloc_desc(start + i, node);
 		if (!desc)
 			goto err;
 		mutex_lock(&sparse_irq_lock);
-		irq_insert_desc(start + i, desc);
+		res = irq_insert_desc(start + i, desc);
 		mutex_unlock(&sparse_irq_lock);
+		if (res)
+			goto err_insert;
 	}
 	return start;
 
+err_insert:
+	free_a_desc(start + i, desc);
 err:
 	for (i--; i >= 0; i--)
 		free_desc(start + i);

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 3/4] irq: remove unnecessary __ref on irq_alloc_descs
       [not found] <generic-irq-changes@mdm.bga.com>
@ 2011-05-25  6:34 ` Milton Miller
  2011-05-25  6:34 ` [PATCH RFC 4/4] irq: allow a per-allocation upper limit when allocating irqs Milton Miller
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 12+ messages in thread
From: Milton Miller @ 2011-05-25  6:34 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: linux-kernel

febcb0c59a (irq: Remove unnecessary bootmem code) removed the
slab_is_available() conditonal call to alloc_bootmem* but missed
removeing the __ref annotation.

Signed-off-by: Milton Miller <miltonm@bga.com>

Index: work.git/kernel/irq/irqdesc.c
===================================================================
--- work.git.orig/kernel/irq/irqdesc.c	2011-05-23 14:29:22.960588100 -0500
+++ work.git/kernel/irq/irqdesc.c	2011-05-23 14:29:27.377586564 -0500
@@ -351,8 +351,7 @@ EXPORT_SYMBOL_GPL(irq_free_descs);
  *
  * Returns the first irq number or error code
  */
-int __ref
-irq_alloc_descs(int irq, unsigned int from, unsigned int cnt, int node)
+int irq_alloc_descs(int irq, unsigned int from, unsigned int cnt, int node)
 {
 	int start, ret;
 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/4] sparse irq: protect irq_to_desc against irq_free_descs
       [not found] <generic-irq-changes@mdm.bga.com>
  2011-05-25  6:34 ` [PATCH 3/4] irq: remove unnecessary __ref on irq_alloc_descs Milton Miller
  2011-05-25  6:34 ` [PATCH RFC 4/4] irq: allow a per-allocation upper limit when allocating irqs Milton Miller
@ 2011-05-25  6:34 ` Milton Miller
  2011-05-25  8:14   ` Thomas Gleixner
  2011-05-25  6:34 ` [PATCH 2/4] irq: radix_tree_insert can fail Milton Miller
  3 siblings, 1 reply; 12+ messages in thread
From: Milton Miller @ 2011-05-25  6:34 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: linux-kernel, Paul E. McKenney

The radix-tree code uses call_rcu to delay freeing internal data
elements when removing when deleting an entry.  We must protect
against the elements being freed while we traverse the tree.

While preparing a patch to expand the contexts in which the radix
tree optionally used by powerpc for mapping hardware irq numbers to
linux numbers would be called, I realized that the radix tree was
not locked when radix_tree_lookup was called.  I then realized the
same issue applies to the generic irq code when sparse irqs are in use.

While the powerpc radix tree was only referenced from one callsite
that was irqs_disabled and irq_enter, irq_to_desc is called from
many more contexts including threaded irq handlers and other
process contexts.

This does not show up in the rcu lockdep because in 2.6.34 commit
2676a58c98 (radix-tree: Disable RCU lockdep checking in radix tree)
deemed it too hard to pass the condition of the protecting lock
to the library.

Signed-off-by: Milton Miller <miltonm@bga.com>
Cc: <stable@kernel.org>
---
I expect the relatively infrequent calls to irq_free_descs, combined
with most calls to irq_to_desc being irqs_disabled and the fact
merged to mainline implemntations of call_rcu requiring a cpu to
respond to a hard irq or schedule has hidden this error to date.

Index: work.git/kernel/irq/irqdesc.c
===================================================================
--- work.git.orig/kernel/irq/irqdesc.c	2011-05-23 13:34:08.728585785 -0500
+++ work.git/kernel/irq/irqdesc.c	2011-05-23 13:46:09.197635762 -0500
@@ -108,7 +108,13 @@ static void irq_insert_desc(unsigned int
 
 struct irq_desc *irq_to_desc(unsigned int irq)
 {
-	return radix_tree_lookup(&irq_desc_tree, irq);
+	struct irq_desc *desc;
+
+	rcu_read_lock();
+	desc = radix_tree_lookup(&irq_desc_tree, irq);
+	rcu_read_unlock();
+
+	return desc
 }
 
 static void delete_irq_desc(unsigned int irq)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC 4/4] irq: allow a per-allocation upper limit when allocating irqs
  2011-05-25  6:34 ` [PATCH RFC 4/4] irq: allow a per-allocation upper limit when allocating irqs Milton Miller
@ 2011-05-25  7:55   ` Ingo Molnar
  2011-05-25  8:32   ` Thomas Gleixner
  2011-05-27  3:38   ` Grant Likely
  2 siblings, 0 replies; 12+ messages in thread
From: Ingo Molnar @ 2011-05-25  7:55 UTC (permalink / raw)
  To: Milton Miller; +Cc: Thomas Gleixner, linux-kernel, Grant Likely


* Milton Miller <miltonm@bga.com> wrote:

> Allow the option to specify an upper limit to the irq numbers for
> each allocation.  The limit is non-inclusive, and 0 means no limit
> needed by caller.
> 
> Some irq chips can support a relative large and arbitrary range,
> but not infinite.  For example, they may use the linux irq number
> as the msi desciptor data, which for msi is 16 bits.
> 
> Since e7bcecb7b1 (genirq: Make nr_irqs runtime expandable), checking
> NR_IRQS or even nr_irqs is not sufficient to enforce such requirements
> when sparse irqs are configured.

Would be nice to add some more background info to the changelog, like what bad 
things happen if this change is not provided and what good things would happen 
if it is provided.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/4] sparse irq: protect irq_to_desc against irq_free_descs
  2011-05-25  6:34 ` [PATCH 1/4] sparse irq: protect irq_to_desc against irq_free_descs Milton Miller
@ 2011-05-25  8:14   ` Thomas Gleixner
  2011-05-25 10:49     ` Milton Miller
  0 siblings, 1 reply; 12+ messages in thread
From: Thomas Gleixner @ 2011-05-25  8:14 UTC (permalink / raw)
  To: Milton Miller; +Cc: linux-kernel, Paul E. McKenney

On Wed, 25 May 2011, Milton Miller wrote:
> The radix-tree code uses call_rcu to delay freeing internal data
> elements when removing when deleting an entry.  We must protect
> against the elements being freed while we traverse the tree.
> 
> While preparing a patch to expand the contexts in which the radix
> tree optionally used by powerpc for mapping hardware irq numbers to
> linux numbers would be called, I realized that the radix tree was
> not locked when radix_tree_lookup was called.  I then realized the
> same issue applies to the generic irq code when sparse irqs are in use.
> 
> While the powerpc radix tree was only referenced from one callsite
> that was irqs_disabled and irq_enter, irq_to_desc is called from
> many more contexts including threaded irq handlers and other
> process contexts.
> 
> This does not show up in the rcu lockdep because in 2.6.34 commit
> 2676a58c98 (radix-tree: Disable RCU lockdep checking in radix tree)
> deemed it too hard to pass the condition of the protecting lock
> to the library.
> 
> Signed-off-by: Milton Miller <miltonm@bga.com>
> Cc: <stable@kernel.org>
> ---
> I expect the relatively infrequent calls to irq_free_descs, combined
> with most calls to irq_to_desc being irqs_disabled and the fact
> merged to mainline implemntations of call_rcu requiring a cpu to
> respond to a hard irq or schedule has hidden this error to date.

The reason why nobody ever noticed is that the free happens in the
teardown path of PCI devices and at this point nothing accesses that
irq anymore.
 
> Index: work.git/kernel/irq/irqdesc.c
> ===================================================================
> --- work.git.orig/kernel/irq/irqdesc.c	2011-05-23 13:34:08.728585785 -0500
> +++ work.git/kernel/irq/irqdesc.c	2011-05-23 13:46:09.197635762 -0500
> @@ -108,7 +108,13 @@ static void irq_insert_desc(unsigned int
>  
>  struct irq_desc *irq_to_desc(unsigned int irq)
>  {
> -	return radix_tree_lookup(&irq_desc_tree, irq);
> +	struct irq_desc *desc;
> +
> +	rcu_read_lock();
> +	desc = radix_tree_lookup(&irq_desc_tree, irq);
> +	rcu_read_unlock();
> +
> +	return desc

That does not really compile :)

And it does not help at all because we unconditionally free the irq
descriptor and do not use rcu based kfree. Further you protect only
the lookup and not the complete section which uses the descriptor, so
it could go away after the rcu_read_unlock() in theory.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/4] irq: radix_tree_insert can fail
  2011-05-25  6:34 ` [PATCH 2/4] irq: radix_tree_insert can fail Milton Miller
@ 2011-05-25  8:18   ` Thomas Gleixner
  2011-05-25 10:48     ` Milton Miller
  0 siblings, 1 reply; 12+ messages in thread
From: Thomas Gleixner @ 2011-05-25  8:18 UTC (permalink / raw)
  To: Milton Miller; +Cc: linux-kernel

On Wed, 25 May 2011, Milton Miller wrote:

> Check the insert, and if it fails cleanup and free all partial work.
> 
> Sparse irq was not checking the return code from radix_tree_insert,
> but it may need to allocate memory and can fail.  If it failed,
> it still claimed success to the caller but the affected irq(s) are
> unavailable and the reference to the affected descriptors is leaked.
> 
> Signed-off-by: Milton Miller <miltonm@bga.com>
> ---
> I started by tring to change free_desc to take the descriptor pointer
> and pushing that down, but that soon ran into conflicts between the
> array and sparse implementations, and/or the old dynamic irq cleanup
> function that is still used by some architectures.  This version is
> targeted, and also protects against scribbles to irq_data.irq.

The simpler solution is to move irq_insert_desc() into alloc_desc()
and deal with the error case there.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC 4/4] irq: allow a per-allocation upper limit when allocating irqs
  2011-05-25  6:34 ` [PATCH RFC 4/4] irq: allow a per-allocation upper limit when allocating irqs Milton Miller
  2011-05-25  7:55   ` Ingo Molnar
@ 2011-05-25  8:32   ` Thomas Gleixner
  2011-05-27  3:38   ` Grant Likely
  2 siblings, 0 replies; 12+ messages in thread
From: Thomas Gleixner @ 2011-05-25  8:32 UTC (permalink / raw)
  To: Milton Miller; +Cc: linux-kernel, Grant Likely

On Wed, 25 May 2011, Milton Miller wrote:
>  
> +int irq_alloc_descs(int irq, unsigned int from, unsigned int cnt, int node)

You might want this to be inline :)

> +{
> +	if (irq < 0)
> +		return irq_alloc_descs_range(from, 0, cnt, node);
> +	/* fail if specified start at 4 and obtain 6 */

  -ENOPARSE

> +	if (irq != from)
> +		return -EEXIST;

  -EINVAL perhaps ?

> +	return irq_alloc_descs_range(from, from + cnt, cnt, node);
> +}
> +
>  static inline int irq_alloc_desc(int node)
>  {
>  	return irq_alloc_descs(-1, 0, 1, node);
> Index: work.git/kernel/irq/irqdesc.c
> ===================================================================
> --- work.git.orig/kernel/irq/irqdesc.c	2011-05-25 01:02:11.454480436 -0500
> +++ work.git/kernel/irq/irqdesc.c	2011-05-25 01:04:03.441480315 -0500
> @@ -343,27 +343,37 @@ void irq_free_descs(unsigned int from, u
>  EXPORT_SYMBOL_GPL(irq_free_descs);
>  
>  /**
> - * irq_alloc_descs - allocate and initialize a range of irq descriptors
> - * @irq:	Allocate for specific irq number if irq >= 0
> + * irq_alloc_descs_range - allocate and initialize a range of irq descriptors
>   * @from:	Start the search from this irq number
> + * @limit:	Unless zero, all irq numbers must be less than this value
>   * @cnt:	Number of consecutive irqs to allocate.
>   * @node:	Preferred node on which the irq descriptor should be allocated
>   *
>   * Returns the first irq number or error code
>   */
> -int irq_alloc_descs(int irq, unsigned int from, unsigned int cnt, int node)
> +int irq_alloc_descs_range(unsigned int from, unsigned int limit,
> +		unsigned int cnt, int node)
>  {
> -	int start, ret;
> +	unsigned int start, end;
> +	int ret;
>  
>  	if (!cnt)
>  		return -EINVAL;
> +	if (limit) {

Why 0 ? Just use UINT_MAX (or something like IRQ_ALLOC_ANY) for the
limit when you want an unlimited allocation.

> +		if (cnt > limit)
> +			return -ENOMEM;
> +		if (from > limit - cnt)
> +			return -EINVAL;
> +		end = min_t(unsigned int, limit, IRQ_BITMAP_BITS);
> +	} else {
> +		end = IRQ_BITMAP_BITS;
> +	}
>  
>  	mutex_lock(&sparse_irq_lock);
>  
> -	start = bitmap_find_next_zero_area(allocated_irqs, IRQ_BITMAP_BITS,
> -					   from, cnt, 0);
> +	start = bitmap_find_next_zero_area(allocated_irqs, end, from, cnt, 0);
>  	ret = -EEXIST;
> -	if (irq >=0 && start != irq)
> +	if (start >= end)
>  		goto err;

Otherwise I like the approach in general.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/4] irq: radix_tree_insert can fail
  2011-05-25  8:18   ` Thomas Gleixner
@ 2011-05-25 10:48     ` Milton Miller
  0 siblings, 0 replies; 12+ messages in thread
From: Milton Miller @ 2011-05-25 10:48 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: linux-kernel

On Wed, 25 May 2011 about 10:18:40 +0200 (CEST), Thomas Gleixner wrote:
> On Wed, 25 May 2011, Milton Miller wrote:
> 
> > Check the insert, and if it fails cleanup and free all partial work.
> > 
> > Sparse irq was not checking the return code from radix_tree_insert,
> > but it may need to allocate memory and can fail.  If it failed,
> > it still claimed success to the caller but the affected irq(s) are
> > unavailable and the reference to the affected descriptors is leaked.
> > 
> > Signed-off-by: Milton Miller <miltonm@bga.com>
> > ---
> > I started by tring to change free_desc to take the descriptor pointer
> > and pushing that down, but that soon ran into conflicts between the
> > array and sparse implementations, and/or the old dynamic irq cleanup
> > function that is still used by some architectures.  This version is
> > targeted, and also protects against scribbles to irq_data.irq.
> 
> The simpler solution is to move irq_insert_desc() into alloc_desc()
> and deal with the error case there.

And I see that this one too as written does not compile, due to
my faulty config testing.  I'll explore your idea tomorrow after
some sleep, but it makes sense.

milton

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/4] sparse irq: protect irq_to_desc against irq_free_descs
  2011-05-25  8:14   ` Thomas Gleixner
@ 2011-05-25 10:49     ` Milton Miller
  2011-05-25 10:54       ` Thomas Gleixner
  0 siblings, 1 reply; 12+ messages in thread
From: Milton Miller @ 2011-05-25 10:49 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: linux-kernel, Paul E. McKenney

On Wed, 25 May 2011 about 10:14:20 +0200 (CEST), Thomas Gleixner wrote:
> On Wed, 25 May 2011, Milton Miller wrote:
> > The radix-tree code uses call_rcu to delay freeing internal data
> > elements when removing when deleting an entry.  We must protect
> > against the elements being freed while we traverse the tree.
> > 
> > While preparing a patch to expand the contexts in which the radix
> > tree optionally used by powerpc for mapping hardware irq numbers to
> > linux numbers would be called, I realized that the radix tree was
> > not locked when radix_tree_lookup was called.  I then realized the
> > same issue applies to the generic irq code when sparse irqs are in use.
> > 
> > While the powerpc radix tree was only referenced from one callsite
> > that was irqs_disabled and irq_enter, irq_to_desc is called from
> > many more contexts including threaded irq handlers and other
> > process contexts.
> > 
> > This does not show up in the rcu lockdep because in 2.6.34 commit
> > 2676a58c98 (radix-tree: Disable RCU lockdep checking in radix tree)
> > deemed it too hard to pass the condition of the protecting lock
> > to the library.
> > 
> > Signed-off-by: Milton Miller <miltonm@bga.com>
> > Cc: <stable@kernel.org>
> > ---
> > I expect the relatively infrequent calls to irq_free_descs, combined
> > with most calls to irq_to_desc being irqs_disabled and the fact
> > merged to mainline implemntations of call_rcu requiring a cpu to
> > respond to a hard irq or schedule has hidden this error to date.
> 
> The reason why nobody ever noticed is that the free happens in the
> teardown path of PCI devices and at this point nothing accesses that
> irq anymore.

I'm not talking about the irq_desc that is being removed from the
tree, instead I'm talking about the internal elements in the
radix tree that are used to find the pointers at the bottom of
the tree (that in turn point to the irq_desc in the irq tree case).

>  
> 
> > Index: work.git/kernel/irq/irqdesc.c
> > ===================================================================
> > --- work.git.orig/kernel/irq/irqdesc.c	2011-05-23 13:34:08.728585785 -0500
> > +++ work.git/kernel/irq/irqdesc.c	2011-05-23 13:46:09.197635762 -0500
> > @@ -108,7 +108,13 @@ static void irq_insert_desc(unsigned int
> >  
> >  struct irq_desc *irq_to_desc(unsigned int irq)
> >  {
> > -	return radix_tree_lookup(&irq_desc_tree, irq);
> > +	struct irq_desc *desc;
> > +
> > +	rcu_read_lock();
> > +	desc = radix_tree_lookup(&irq_desc_tree, irq);
> > +	rcu_read_unlock();
> > +
> > +	return desc
> 
> That does not really compile :)

Hmm, Ooops I thought I had sparse irq enabled, but I had

CONFIG_HAVE_SPARSE_IRQ=y
# CONFIG_SPARSE_IRQ is not set

and indeed, I am missing a semicolon.  Sorry about that.

> 
> And it does not help at all because we unconditionally free the irq
> descriptor and do not use rcu based kfree. Further you protect only

As I said above, its the actual radix tree leading to the descriptors
that are currently freed with call_rcu that I'm tring to cover.

> the lookup and not the complete section which uses the descriptor, so
> it could go away after the rcu_read_unlock() in theory.

Presently there is no locking in the generic layer between allocating
a descriptor and allowing the irq to be used; that seems to be left
to the code in the various architetures that call irq_alloc_descs.

In fact the generic layer doesn't check if there are irq actions
chained off the irq, let alone if the irq is in progress or the thread
is active.  While I agree it may be prudent to add such locking,
it is a seperate issue beyond traversing the radix tree.

Most of the architecture code I've seen so far doesn't check either.
Powerpc calls synchronise_irq, but that leaves windows and doesn't
gard against actions being registered.  Otherwise it's up to the
callers to not request an irq be freed before it is shutdown.

Back to this patch.

Depending on the tree I believe (from my partial reading of the radix
tree code) an element being deleted could collapse the path to adjacent
entries.  The fact that the tree will not always collapse will further
reduce the incidence of an error being detected and reported beyond
the fact that a lookup has to be delayed across an rcu boundary, which
the merged mainline rcu schemes will not do with hard irq disabled.

So one needs (1) the ability to collapse a tree node and (2) an
irq descriptor lookup to traverse that node either (3a) not from an
irq handler, but an irq thread or other process or bh context, or
(3b) with an out-of-tree rcu such as the Concurent RT rcu that Paul
mentioned, and (4) the free to result in the memory used by the lookup
traversial to be overwriten.   Rare, but probably not impossible.

milton

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/4] sparse irq: protect irq_to_desc against irq_free_descs
  2011-05-25 10:49     ` Milton Miller
@ 2011-05-25 10:54       ` Thomas Gleixner
  0 siblings, 0 replies; 12+ messages in thread
From: Thomas Gleixner @ 2011-05-25 10:54 UTC (permalink / raw)
  To: Milton Miller; +Cc: linux-kernel, Paul E. McKenney

On Wed, 25 May 2011, Milton Miller wrote:
> On Wed, 25 May 2011 about 10:14:20 +0200 (CEST), Thomas Gleixner wrote:
> > 
> > The reason why nobody ever noticed is that the free happens in the
> > teardown path of PCI devices and at this point nothing accesses that
> > irq anymore.
> 
> I'm not talking about the irq_desc that is being removed from the
> tree, instead I'm talking about the internal elements in the
> radix tree that are used to find the pointers at the bottom of
> the tree (that in turn point to the irq_desc in the irq tree case).

Oops, did not think about that one :)

-ENOTENOUGHCOFFEE

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC 4/4] irq: allow a per-allocation upper limit when allocating irqs
  2011-05-25  6:34 ` [PATCH RFC 4/4] irq: allow a per-allocation upper limit when allocating irqs Milton Miller
  2011-05-25  7:55   ` Ingo Molnar
  2011-05-25  8:32   ` Thomas Gleixner
@ 2011-05-27  3:38   ` Grant Likely
  2 siblings, 0 replies; 12+ messages in thread
From: Grant Likely @ 2011-05-27  3:38 UTC (permalink / raw)
  To: Milton Miller; +Cc: Thomas Gleixner, linux-kernel

On Wed, May 25, 2011 at 01:34:18AM -0500, Milton Miller wrote:
> Allow the option to specify an upper limit to the irq numbers for
> each allocation.  The limit is non-inclusive, and 0 means no limit
> needed by caller.
> 
> Some irq chips can support a relative large and arbitrary range,
> but not infinite.  For example, they may use the linux irq number
> as the msi desciptor data, which for msi is 16 bits.
> 
> Since e7bcecb7b1 (genirq: Make nr_irqs runtime expandable), checking
> NR_IRQS or even nr_irqs is not sufficient to enforce such requirements
> when sparse irqs are configured.
> 
> Based on an irc discussion, make the limit per call instead of an
> arch callback or global setting.
> 
> If the requested count is above the limit, return -ENOMEM as if
> nr_irqs could not be expanded, assuming the limit was specified at
> a different layer than the count.
> 
> This code does not try to keep the prior semantics of irq_alloc_descs
> when irq was non-negative but not equal to from.  I believe it was an
> implementation artifact instead of a designed feature that one could
> specify search starting from 4 and fail if the allocated irq was not
> exactly 6, confirming that the intervening irqs were reserved.
> 
> Suggested-by: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Milton Miller <miltonm@bga.com>

Looks good to me/

g.


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2011-05-27  3:38 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <generic-irq-changes@mdm.bga.com>
2011-05-25  6:34 ` [PATCH 3/4] irq: remove unnecessary __ref on irq_alloc_descs Milton Miller
2011-05-25  6:34 ` [PATCH RFC 4/4] irq: allow a per-allocation upper limit when allocating irqs Milton Miller
2011-05-25  7:55   ` Ingo Molnar
2011-05-25  8:32   ` Thomas Gleixner
2011-05-27  3:38   ` Grant Likely
2011-05-25  6:34 ` [PATCH 1/4] sparse irq: protect irq_to_desc against irq_free_descs Milton Miller
2011-05-25  8:14   ` Thomas Gleixner
2011-05-25 10:49     ` Milton Miller
2011-05-25 10:54       ` Thomas Gleixner
2011-05-25  6:34 ` [PATCH 2/4] irq: radix_tree_insert can fail Milton Miller
2011-05-25  8:18   ` Thomas Gleixner
2011-05-25 10:48     ` Milton Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.