linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC][PATCH] Cascaded interrupts: a simple solution
@ 2006-05-11  5:26 Benjamin Herrenschmidt
  2006-05-11  5:41 ` [PATCH] PowerMac use of SA_CASCADEIRQ Benjamin Herrenschmidt
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Benjamin Herrenschmidt @ 2006-05-11  5:26 UTC (permalink / raw)
  To: Linux Kernel list
  Cc: Linus Torvalds, Paul Mackerras, Andrew Morton, Ingo Molnar,
	Thomas Gleixner

Hi !

While working on some major cleanup of the powerpc architecture
interrupt handling, I found out that I could easily solve one of our
problems with a small addition to the interrupt core. This will probably
be useful to other architectures as well.

It's quite common in ppc-land to have cascaded interrupt controllers.
That is, a given interrupt is actually the output of another interrupt
controller (a demultiplexer if you prefer). Currently, the typical
setups are a master OpenPIC/MPIC with a cascaded i8259, or on PowerMacs,
a MPICs and a slave MPIC. However, Cell is also bringing similar
cascades into the picture (one iic per thread and cascaded south bridge
interrupts, from the spider chip for now and others in the future) and
it's also a common setup on embedded.

At the moment, we handled that in a fairly dirty way. That is, the
interrupt controller "driver" would have a magic "hook" for setting up a
cascade callback on one of its interrupt sources. In some cases (like
Cell), it's even hard coded (one interrupt controller driver calls
directly into another one whem the interrupt is the cascade).

I figured out that it would be very simple to make that much cleaner and
generic by simply defining a new interrupt flag SA_CASCADEIRQ to attach
to the cascade irq, and have the handler fetch the interrupts from the
slave controller and return them to the core.

This patch is an implementation that was quickly tested on G5 (a second
patch to use that for the G5 follows) and seems to work fine. I changed
irqreturn_t to be an unsigned int to better match the type of "irq" in
most cases. It also relies on the fact that irq 0 (which matches the
constant IRQ_NONE) is invalid (at least is never a valid cascaded
interrupt) which matches a decision made by Linus a while ago when
talking about introducing a generic NO_IRQ.

The reason I made it recursive is that I found any other implementation
attempts I did too ugly for words, it's much more cleaner that way, and
it only recurses as much as there are levels of IRQ cascaded controllers
which I don't expect to ever be more than 2 or 3. Since the involved
functions have very little locals allocated, I decided it was good
enough that way.

Note the fact that it's looping on the handler until IRQ_NONE is
returned, since a cascaded controller might issue only one irq upstream
for any combination of downstream interrupts, it must be "polled" until
it has no more interrupt to return.

If you are happy with this patch, please consider for 2.6.18 and I'll
convert the rest of the powerpc architecture to use this mecanism as
part of my cleanup.

Cheers,
Ben.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

Index: linux-work/include/linux/interrupt.h
===================================================================
--- linux-work.orig/include/linux/interrupt.h	2006-01-14 14:43:35.000000000 +1100
+++ linux-work/include/linux/interrupt.h	2006-05-11 12:01:32.000000000 +1000
@@ -27,8 +27,13 @@
  * IRQ_NONE means we didn't handle it.
  * IRQ_HANDLED means that we did have a valid interrupt and handled it.
  * IRQ_RETVAL(x) selects on the two depending on x being non-zero (for handled)
+ *
+ * A cascade interrupt (using SA_CASCADE) returns IRQ_NONE or a valid interrupt
+ * number. It's assumed that 0 is not a valid cascaded interrupt number. As
+ * per prior discussion with Linus, all archs should migrate to have 0 not be
+ * a valid interrupt number at all anyway.
  */
-typedef int irqreturn_t;
+typedef unsigned int irqreturn_t;
 
 #define IRQ_NONE	(0)
 #define IRQ_HANDLED	(1)
Index: linux-work/include/linux/signal.h
===================================================================
--- linux-work.orig/include/linux/signal.h	2006-05-11 11:45:09.000000000 +1000
+++ linux-work/include/linux/signal.h	2006-05-11 11:49:38.000000000 +1000
@@ -15,8 +15,10 @@
  * SA_INTERRUPT is also used by the irq handling routines.
  * SA_SHIRQ is for shared interrupt support on PCI and EISA.
  * SA_PROBEIRQ is set by callers when they expect sharing mismatches to occur
+ * SA_CASCADE is set for a cascade (demultiplexer) interrupt handler
  */
 #define SA_SAMPLE_RANDOM	SA_RESTART
+#define SA_CASCADEIRQ		0x02000000
 #define SA_SHIRQ		0x04000000
 #define SA_PROBEIRQ		0x08000000
 
Index: linux-work/kernel/irq/handle.c
===================================================================
--- linux-work.orig/kernel/irq/handle.c	2006-01-14 14:43:37.000000000 +1100
+++ linux-work/kernel/irq/handle.c	2006-05-11 12:03:41.000000000 +1000
@@ -79,13 +79,19 @@ irqreturn_t no_action(int cpl, void *dev
 fastcall int handle_IRQ_event(unsigned int irq, struct pt_regs *regs,
 				struct irqaction *action)
 {
-	int ret, retval = 0, status = 0;
+	unsigned int ret;
+	int retval = 0, status = 0;
 
 	if (!(action->flags & SA_INTERRUPT))
 		local_irq_enable();
 
 	do {
 		ret = action->handler(irq, action->dev_id, regs);
+		if ((action->flags & SA_CASCADEIRQ) && (ret != IRQ_NONE)) {
+			retval |= IRQ_HANDLED;
+			__do_IRQ(ret, regs);
+			continue;
+		}
 		if (ret == IRQ_HANDLED)
 			status |= action->flags;
 		retval |= ret;
Index: linux-work/kernel/irq/manage.c
===================================================================
--- linux-work.orig/kernel/irq/manage.c	2006-05-11 11:45:09.000000000 +1000
+++ linux-work/kernel/irq/manage.c	2006-05-11 12:00:52.000000000 +1000
@@ -371,6 +371,13 @@ int request_irq(unsigned int irq,
 	if (!handler)
 		return -EINVAL;
 
+	/*
+	 * SA_CASCADE implies SA_INTERRUPT (that is, the demux itself happens
+	 * with interrupts disabled, the final handler is still called with
+	 */
+	if (irqflags & SA_CASCADEIRQ)
+		irqflags |= SA_INTERRUPT;
+
 	action = kmalloc(sizeof(struct irqaction), GFP_ATOMIC);
 	if (!action)
 		return -ENOMEM;
Index: linux-work/include/linux/irq.h
===================================================================
--- linux-work.orig/include/linux/irq.h	2006-03-31 12:13:05.000000000 +1100
+++ linux-work/include/linux/irq.h	2006-05-11 14:34:58.000000000 +1000
@@ -172,7 +172,8 @@ extern fastcall int handle_IRQ_event(uns
 					struct irqaction *action);
 extern fastcall unsigned int __do_IRQ(unsigned int irq, struct pt_regs *regs);
 extern void note_interrupt(unsigned int irq, irq_desc_t *desc,
-					int action_ret, struct pt_regs *regs);
+					unsigned int  action_ret,
+			   		struct pt_regs *regs);
 extern int can_request_irq(unsigned int irq, unsigned long irqflags);
 
 extern void init_irq_proc(void);
Index: linux-work/kernel/irq/spurious.c
===================================================================
--- linux-work.orig/kernel/irq/spurious.c	2006-01-14 14:43:37.000000000 +1100
+++ linux-work/kernel/irq/spurious.c	2006-05-11 14:35:21.000000000 +1000
@@ -133,7 +133,7 @@ static void report_bad_irq(unsigned int 
 	}
 }
 
-void note_interrupt(unsigned int irq, irq_desc_t *desc, irqreturn_t action_ret,
+void note_interrupt(unsigned int irq, irq_desc_t *desc, unsigned int action_ret,
 			struct pt_regs *regs)
 {
 	if (action_ret != IRQ_HANDLED) {



^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH] PowerMac use of SA_CASCADEIRQ
  2006-05-11  5:26 [RFC][PATCH] Cascaded interrupts: a simple solution Benjamin Herrenschmidt
@ 2006-05-11  5:41 ` Benjamin Herrenschmidt
  2006-05-11  6:13 ` [RFC][PATCH] Cascaded interrupts: a simple solution Benjamin Herrenschmidt
  2006-05-11  9:37 ` Ingo Molnar
  2 siblings, 0 replies; 5+ messages in thread
From: Benjamin Herrenschmidt @ 2006-05-11  5:41 UTC (permalink / raw)
  To: Linux Kernel list
  Cc: Linus Torvalds, Paul Mackerras, Andrew Morton, Ingo Molnar,
	Thomas Gleixner

On Thu, 2006-05-11 at 15:26 +1000, Benjamin Herrenschmidt wrote:

> This patch is an implementation that was quickly tested on G5 (a second
> patch to use that for the G5 follows) and seems to work fine.

Here is the patch implementing that for the cascaced MPIC. Seems to work
fine here. (This is not useful on Quad G5s btw, only older ones really
use a cascade). This is more a test patch, not intended for merging.
I'll do a proper one if the support for SA_CASCADEIRQ goes in. There are
other issues with the way the PowerMac codes uses static irqaction
structures & directly calls setup_irq() in order to be able to install
irq actions before kmalloc is available. I want to make request_irq()
useable earlier instead, by having fallback on bootmem alloc, it make a
lot of sense for low level architecture interrupts that will never be
freed. I also noticed that the irq proc stuff has an issue with irqs
created before /proc is initialized, it will create the /proc/irq/* but
nothing below for existing actions. I'll look into fixing that as well.

Index: linux-work/arch/powerpc/platforms/powermac/pic.c
===================================================================
--- linux-work.orig/arch/powerpc/platforms/powermac/pic.c	2006-01-14 14:43:22.000000000 +1100
+++ linux-work/arch/powerpc/platforms/powermac/pic.c	2006-05-11 15:27:58.000000000 +1000
@@ -199,7 +199,7 @@ struct hw_interrupt_type gatwick_pic = {
 
 static irqreturn_t gatwick_action(int cpl, void *dev_id, struct pt_regs *regs)
 {
-	int irq, bits;
+	unsigned int irq, bits;
 
 	for (irq = max_irqs; (irq -= 32) >= max_real_irqs; ) {
 		int i = irq >> 5;
@@ -210,8 +210,7 @@ static irqreturn_t gatwick_action(int cp
 		if (bits == 0)
 			continue;
 		irq += __ilog2(bits);
-		__do_IRQ(irq, regs);
-		return IRQ_HANDLED;
+		return irq;
 	}
 	printk("gatwick irq not from gatwick pic\n");
 	return IRQ_NONE;
@@ -382,7 +381,7 @@ static struct irqaction xmon_action = {
 
 static struct irqaction gatwick_cascade_action = {
 	.handler	= gatwick_action,
-	.flags		= SA_INTERRUPT,
+	.flags		= SA_CASCADEIRQ | SA_INTERRUPT,
 	.mask		= CPU_MASK_NONE,
 	.name		= "cascade",
 };
@@ -503,11 +502,6 @@ static void __init pmac_pic_probe_oldsty
 }
 #endif /* CONFIG_PPC32 */
 
-static int pmac_u3_cascade(struct pt_regs *regs, void *data)
-{
-	return mpic_get_one_irq((struct mpic *)data, regs);
-}
-
 static void __init pmac_pic_setup_mpic_nmi(struct mpic *mpic)
 {
 #if defined(CONFIG_XMON) && defined(CONFIG_PPC32)
@@ -562,12 +556,28 @@ static struct mpic * __init pmac_setup_o
 	mpic_init(mpic);
 
 	return mpic;
- }
+}
+
+static irqreturn_t mpic_cascade(int irq, void *dev_id, struct pt_regs *regs)
+{
+	int ret = mpic_get_one_irq(dev_id, regs);
+	if (unlikely(ret < 0))
+		return IRQ_NONE;
+	return ret;
+}
+
+static struct irqaction mpic_cascade_action = {
+	.handler	= mpic_cascade,
+	.flags		= SA_INTERRUPT | SA_CASCADEIRQ,
+	.mask		= CPU_MASK_NONE,
+	.name		= "cascade",
+};
 
 static int __init pmac_pic_probe_mpic(void)
 {
 	struct mpic *mpic1, *mpic2;
 	struct device_node *np, *master = NULL, *slave = NULL;
+	int rc;
 
 	/* We can have up to 2 MPICs cascaded */
 	for (np = NULL; (np = of_find_node_by_type(np, "open-pic"))
@@ -613,7 +623,14 @@ static int __init pmac_pic_probe_mpic(vo
 		of_node_put(slave);
 		return 0;
 	}
-	mpic_setup_cascade(slave->intrs[0].line, pmac_u3_cascade, mpic2);
+
+	printk(KERN_DEBUG "Setting up MPIC 2 cascade on irq %d\n",
+	       slave->intrs[0].line);
+	mpic_cascade_action.dev_id = mpic2;
+	rc = setup_irq(slave->intrs[0].line, &mpic_cascade_action);
+	if (rc)
+		printk(KERN_DEBUG "Failed setting up MPIC 2 cascade on irq %d"
+		       " error %d\n", slave->intrs[0].line, rc);
 
 	of_node_put(slave);
 	return 0;
Index: linux-work/kernel/irq/proc.c
===================================================================
--- linux-work.orig/kernel/irq/proc.c	2006-01-14 14:43:37.000000000 +1100
+++ linux-work/kernel/irq/proc.c	2006-05-11 14:55:39.000000000 +1000
@@ -166,6 +166,8 @@ void init_irq_proc(void)
 
 	/*
 	 * Create entries for all existing IRQs.
+	 *
+	 * FIXME: Call register_handler_proc() for existing actions
 	 */
 	for (i = 0; i < NR_IRQS; i++)
 		register_irq_proc(i);

 



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC][PATCH] Cascaded interrupts: a simple solution
  2006-05-11  5:26 [RFC][PATCH] Cascaded interrupts: a simple solution Benjamin Herrenschmidt
  2006-05-11  5:41 ` [PATCH] PowerMac use of SA_CASCADEIRQ Benjamin Herrenschmidt
@ 2006-05-11  6:13 ` Benjamin Herrenschmidt
  2006-05-11  9:37 ` Ingo Molnar
  2 siblings, 0 replies; 5+ messages in thread
From: Benjamin Herrenschmidt @ 2006-05-11  6:13 UTC (permalink / raw)
  To: Linux Kernel list
  Cc: Linus Torvalds, Paul Mackerras, Andrew Morton, Ingo Molnar,
	Thomas Gleixner


> Index: linux-work/kernel/irq/manage.c
> ===================================================================
> --- linux-work.orig/kernel/irq/manage.c	2006-05-11 11:45:09.000000000 +1000
> +++ linux-work/kernel/irq/manage.c	2006-05-11 12:00:52.000000000 +1000
> @@ -371,6 +371,13 @@ int request_irq(unsigned int irq,
>  	if (!handler)
>  		return -EINVAL;
>  
> +	/*
> +	 * SA_CASCADE implies SA_INTERRUPT (that is, the demux itself happens
> +	 * with interrupts disabled, the final handler is still called with
> +	 */
> +	if (irqflags & SA_CASCADEIRQ)
> +		irqflags |= SA_INTERRUPT;
> +

Oh and that bit isn't actually necessary ...

Ben.



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC][PATCH] Cascaded interrupts: a simple solution
  2006-05-11  5:26 [RFC][PATCH] Cascaded interrupts: a simple solution Benjamin Herrenschmidt
  2006-05-11  5:41 ` [PATCH] PowerMac use of SA_CASCADEIRQ Benjamin Herrenschmidt
  2006-05-11  6:13 ` [RFC][PATCH] Cascaded interrupts: a simple solution Benjamin Herrenschmidt
@ 2006-05-11  9:37 ` Ingo Molnar
  2006-05-11 11:47   ` Benjamin Herrenschmidt
  2 siblings, 1 reply; 5+ messages in thread
From: Ingo Molnar @ 2006-05-11  9:37 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Linux Kernel list, Linus Torvalds, Paul Mackerras, Andrew Morton,
	Thomas Gleixner, Russell King


* Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:

> I figured out that it would be very simple to make that much cleaner 
> and generic by simply defining a new interrupt flag SA_CASCADEIRQ to 
> attach to the cascade irq, and have the handler fetch the interrupts 
> from the slave controller and return them to the core.

> Note the fact that it's looping on the handler until IRQ_NONE is 
> returned, since a cascaded controller might issue only one irq 
> upstream for any combination of downstream interrupts, it must be 
> "polled" until it has no more interrupt to return.

We have solved this problem in the genirq patchset, but in a different 
way. No matter how much i'd like to see a simple solution for a hard 
problem, we believe the SA_CASCADEIRQ method is insufficient, for a 
number of reasons.

The main goal of genirq: to merge the ARM architecture to the generic 
IRQ layer and thus make the generic IRQ layer truly generic. Secondary 
goal: to not disturb any of the current genirq architectures (i.e. stay 
fully compatible). The ARM IRQ layer is exotic in some respects, but it 
is certainly the most advanced IRQ layer in terms of PIC-topology 
handling, given the rich variety of ARM hardware in existence.

Just some stats to back this up: arch/arm*/ has 118 separate PIC 
implementations, and amongst them there are more than 40 that need 
demultiplex handlers. As a comparison, arch/[ppc/powerpc] sports 27 PIC 
implementations, amongst which there are 5 that need demultiplex 
handlers. arch/i386 has 10 PIC implementations and none of them need 
demultiplex handlers [there are minor forms of cascading in x86 
hardware, but none need true irq-vector demultiplexing].

Most architectures modeled their IRQ layer after i386 IRQ layer, and 
this resulted in the first phase of generic IRQ layer being quite 
similar to the i386 layer. But the largest and most versatile Linux IRQ 
subsystem was left out of the original genirq design (done by yours 
truly), and many good bits (amongst them the right way to handle 
cascading/chaining) i missed.

Current status of genirq: it has been part of the -rt tree for more than 
a year, and an earlier version has been sent to lkml once already - 
Thomas has submitted an improved version of it to the ARM lists roughly 
a week ago and it is currently being tested on many ARM boards, with 
good results.

(I suspect you are aware of our genirq efforts, hence did you Cc: Thomas 
and me? If you are aware of it, please address the differences between 
the two approaches and outline why you chose a different solution - 
thanks!)

Most PIC implementations do not need to worry about PIC cascading, and 
neither the SA_CASCADEIRQ nor the genirq approach impacts them. So my 
analysis of your patch only involves true cascaded PICs and the ways to 
support them cleanly.

For cascaded PICs, the main problem with the SA_CASCADEIRQ approach is 
that it assumes that cascading/demultiplexing can be handled via a 
"return irq" method, in an iterative way.

But this does not match the demultiplexing model that happens on the 
majority of ARM boards for example: there a bitmask is read from the 
secondary interrupt controller, and that bitmask might have multiple 
bits set. By returning only one bit [which iteration model your 
interface forces], the other bits can be lost. On some hardware those 
missed bits might be regenerated, but there is PIC hardware where that 
information is permanently lost and we end up losing interrupts.

Such mask-based demultiplexing PICs are not limited to ARM, they occur 
in the PPC world too, for example in arch/ppc/syslib/m82xx_pci.c, 
pq2pci_irq_demux():

 static irqreturn_t
 pq2pci_irq_demux(int irq, void *dev_id, struct pt_regs *regs)
 {
         unsigned long stat, mask, pend;
         int bit;

         for(;;) {
                 stat = *(volatile unsigned long *) PCI_INT_STAT_REG;
                 mask = *(volatile unsigned long *) PCI_INT_MASK_REG;
                 pend = stat & ~mask & 0xf0000000;
                 if (!pend)
                         break;
                 for (bit = 0; pend != 0; ++bit, pend <<= 1) {
                         if (pend & 0x80000000)
                                 __do_IRQ(NR_CPM_INTS + bit, regs);
                 }
         }

         return IRQ_HANDLED;
 }

If PCI_INT_MASK_REG is a read-once register, then the SA_CASCADEIRQ 
method could result in lost interrupts. (i'm not totally sure, but i 
think in this specific case that register is read-once and it also 
involves an auto-ack. In any case, there definitely is ARM hardware 
where this equivalent register is read-once.)

Even if PCI_INT_MASK_REG could be read in a non-destructive way, 
multiple bits would need multiple iterations and multiple (unnecessary) 
passes over the whole bitmask - and they would thus also need 
unnecessary IO cycles to re-fetch the mask itself.

(Depending on how many different interrupt sources a secondary PIC 
connects, and how frequently those are risen, this might or might not be 
a real performance problem. But in any case, the "return irq" method is 
certainly an ugly and unnatural model for such PICs and there is no 
clean way to handle this type of cascading via the 'return irq' method.)

So the solution we took in genirq was to delegate the act of 
demultiplexing into the _demultiplexing handler_, by adopting the ARM
IRQ layer's approach of calling desc_handle_irq(irq, desc, regs). For
example:

 static void locomo_gpio_handler(unsigned int irq, struct irqdesc *desc,
                                 struct pt_regs *regs)
 {
 [...]
                irq = LOCOMO_IRQ_GPIO_START;
                d = irq_desc + LOCOMO_IRQ_GPIO_START;
                for (i = 0; i <= 15; i++, irq++, d++) {
                        if (req & (0x0001 << i)) {
                                desc_handle_irq(irq, d, regs);
                        }
                }
 [...]
 }

desc_handle_irq() does the locking and the calling of the highlevel irq 
handler of the secondary PIC. [which then processes the device IRQ 
handler actions and does any pre/post ACKing logic] There is no impact 
to non-cascading PIC designs in this model either.

Another, conceptual level problem is that (ab-)using the irq handler 
methods to return an actual interrupt number is a layering violation. 
There is not much in common between an IRQ demultiplexer function and an 
interrupt handler function. The act of cascading two PICs _inevitably_ 
means that there is a 1:N relationship between the two PICs, while an 
IRQ handler is normally a 1:1 relationship between a hardware device and 
a PIC. (there are exceptions like shared interrupts, but the norm we are 
designing for is a 1:1 relationship) There are many other fundamental 
differences too. Thus in our genirq work we have separated these two 
concepts.

In terms of patch merging (unless there are some arguments i've 
overlooked), due to the reasons above i'm against merging SA_CASCADEIRQ, 
even if it's relatively simple. It does not solve the demultiplexing 
problem for most of the ARM handlers (nor for the PPC example i cited), 
hence a separate variant has to be implemented anyway which results in 
unnecessary code and concept duplication.

The current PPC/powerpc approach of calling __do_IRQ() might be hacky, 
but it's functional, so neither is there any instant urgency AFAICS. If 
our more complete genirq approach is rejected for whatever reason then 
the SA_CASCADEIRQ patch can still be revisited as a secondary choice.

Nor can i see any big cleanup effect in terms of per-arch PIC code, the 
patch actually adds a bit of code:

 arch/powerpc/platforms/powermac/pic.c |   39 ++++++++++++++++++++++++----------
 1 file changed, 28 insertions(+), 11 deletions(-)

	Ingo

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC][PATCH] Cascaded interrupts: a simple solution
  2006-05-11  9:37 ` Ingo Molnar
@ 2006-05-11 11:47   ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 5+ messages in thread
From: Benjamin Herrenschmidt @ 2006-05-11 11:47 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linux Kernel list, Linus Torvalds, Paul Mackerras, Andrew Morton,
	Thomas Gleixner, Russell King


> We have solved this problem in the genirq patchset, but in a different 
> way. No matter how much i'd like to see a simple solution for a hard 
> problem, we believe the SA_CASCADEIRQ method is insufficient, for a 
> number of reasons.
> 
> The main goal of genirq: to merge the ARM architecture to the generic 
> IRQ layer and thus make the generic IRQ layer truly generic. Secondary 
> goal: to not disturb any of the current genirq architectures (i.e. stay 
> fully compatible). The ARM IRQ layer is exotic in some respects, but it 
> is certainly the most advanced IRQ layer in terms of PIC-topology 
> handling, given the rich variety of ARM hardware in existence.

Well... I yet have to look at the current ARM stuff from Russell, but
I've looked at Thomas "port" and there's a lot of stuff in there that I
dislike. I yet have to be convinced that whatever it provides can't be
done with the current code. Now of course, I'll be happy to discuss
every single technical detail until we come to an agreement :) But I'm
afraid this won't make it for 2.6.18. If it does, fine. If it doesn't,
I'd like my approach to be merged for that kernel version as I really
want that in for handling some of the upcoming cell stuff.

I plan to spend a few hours as soon as I find the time to explain more
in depth the technical reasons why I don't like some aspects of Thomas
work, but I want first to take the time to better review the current ARM
implementation and I was sort-of waiting for him to go public so we have
an open discussion.

> Just some stats to back this up: arch/arm*/ has 118 separate PIC 
> implementations, and amongst them there are more than 40 that need 
> demultiplex handlers. As a comparison, arch/[ppc/powerpc] sports 27 PIC 
> implementations, amongst which there are 5 that need demultiplex 
> handlers. arch/i386 has 10 PIC implementations and none of them need 
> demultiplex handlers [there are minor forms of cascading in x86 
> hardware, but none need true irq-vector demultiplexing].

Heh, we don't need to do a pissing contest here :)

> Most architectures modeled their IRQ layer after i386 IRQ layer, and 
> this resulted in the first phase of generic IRQ layer being quite 
> similar to the i386 layer. But the largest and most versatile Linux IRQ 
> subsystem was left out of the original genirq design (done by yours 
> truly), and many good bits (amongst them the right way to handle 
> cascading/chaining) i missed.

Heh, I know and that's why I CC'ed you :)

>From what I've seen of Thomas initial work, if you remove all of the
renaming of things and splitting into irq_chip vs. irq_type, it
essentially boils down to having a way to re-implement the do_IRQ logic
(and it's flag/locking manipulations) based on the IRQ type. However,
from what I've seen, compared to the current code, it doens't provide
much that the current code can't do with maybe few bits & pieces
changed, adds overhead (another layer of indirect funciton calls is
expensive) and forces us in a model that don't match most of the
modern/advanced irq controllers nicely without bloat (basically having
spearate irq_type for them).

I'll get in more details later (it's a bit late here and I'm tired :)

> Current status of genirq: it has been part of the -rt tree for more than 
> a year, and an earlier version has been sent to lkml once already - 
> Thomas has submitted an improved version of it to the ARM lists roughly 
> a week ago and it is currently being tested on many ARM boards, with 
> good results.
> 
> (I suspect you are aware of our genirq efforts, hence did you Cc: Thomas 
> and me? If you are aware of it, please address the differences between 
> the two approaches and outline why you chose a different solution - 
> thanks!)

I will. Not tonight though.

> Most PIC implementations do not need to worry about PIC cascading, and 
> neither the SA_CASCADEIRQ nor the genirq approach impacts them. So my 
> analysis of your patch only involves true cascaded PICs and the ways to 
> support them cleanly.
> 
> For cascaded PICs, the main problem with the SA_CASCADEIRQ approach is 
> that it assumes that cascading/demultiplexing can be handled via a 
> "return irq" method, in an iterative way.

Yup.

> But this does not match the demultiplexing model that happens on the 
> majority of ARM boards for example: there a bitmask is read from the 
> secondary interrupt controller, and that bitmask might have multiple 
> bits set.

Which is also what happens on some PowerMacs

>  By returning only one bit [which iteration model your 
> interface forces], the other bits can be lost. On some hardware those 
> missed bits might be regenerated, but there is PIC hardware where that 
> information is permanently lost and we end up losing interrupts.

It's fairly simple to handle that in the cascaded controllre driver by
or'ing read bits into a "pending" mask and clearing them as they get
returned to the upper layer. I don't see the need of adding layers of
indirections to handle that simple trick. Looping is the right way to go
as my patches imlement to then get one of them at a time. Now if you are
worried that we might end up artificially prioritizing the first ones
that way, them we can simply also cache the last bit "scanned" in the
pending mask and start from there on the next iteration. Again, very
simple code that doesn't imho requires turning the whole model upside
down.

> Such mask-based demultiplexing PICs are not limited to ARM, they occur 
> in the PPC world too, for example in arch/ppc/syslib/m82xx_pci.c, 
> pq2pci_irq_demux():

 .../...

> If PCI_INT_MASK_REG is a read-once register, then the SA_CASCADEIRQ 
> method could result in lost interrupts. (i'm not totally sure, but i 
> think in this specific case that register is read-once and it also 
> involves an auto-ack. In any case, there definitely is ARM hardware 
> where this equivalent register is read-once.)

If implemented stupidly it surely would :) But as I wrote above, it can
easily be implemented in a way that won't result in lost interrupts.

> Even if PCI_INT_MASK_REG could be read in a non-destructive way, 
> multiple bits would need multiple iterations and multiple (unnecessary) 
> passes over the whole bitmask - and they would thus also need 
> unnecessary IO cycles to re-fetch the mask itself.

True, my approach would cause re-fetches from the mask though that could
be avoided with careful coding as well (only re-fetching when the
current "pointer" into the mask wraps around). I yet have to be convince
though that your model, while maybe slightly simplifying the
implementation of the cascaded controller "fetch" routine, justifies the
overhead & bloat of the generic code.

An example is controllers like OpenPIC/MPIC that handle masking and
priority stacking transparently currently require very little code in
the begin and end handlers of irq_desc. With Thomas model, begin() and
end() are gone in irq_chip. Thus, I would end up causing a full mask &
umask (significant overhead) if using the standard handlers for level
and edge irq_types. Which basically means that I would have to implement
different irq_types to avoid that.

In a similar vein, I'm losing startup/shutdown from the controller which
I'm using to internally setup the HyperTransport APICs. I could do it
elsewhere but it's not as nice imho... or use custom irq_type's. Same
for XICS.

Finally, set_affinity() is also missing from irq_chip.

So you basically moved all of the previous interface
(begin/end/startup/shutdown/etc...) to the IRQ controller to the new
"irq_type" layer which then "emulates" the functionality of a smart irq
controller on top of an "irq_chip" which represents a totally dumb
controller than can only mask/unmask.

I don't think it's the right approach.

> (Depending on how many different interrupt sources a secondary PIC 
> connects, and how frequently those are risen, this might or might not be 
> a real performance problem. But in any case, the "return irq" method is 
> certainly an ugly and unnatural model for such PICs and there is no 
> clean way to handle this type of cascading via the 'return irq' method.)
> 
> So the solution we took in genirq was to delegate the act of 
> demultiplexing into the _demultiplexing handler_, by adopting the ARM
> IRQ layer's approach of calling desc_handle_irq(irq, desc, regs). For
> example:

The "handler" is the irq_type structure. Looking at all irq_type
implementations in thomas patch, it's actually the only thing that
changes from a type to another.... The rest all goes to 'default'. I
think there is some over-engineering there.

If you guys really want to keep the concept of having a "handler" for
different implementation of controllers/irqs, them it could have simply
been part of the irq_desc.

I yet also have to figure out if your approach allows the cascade to be
shared with a normal IRQ. Mine does. (Yours might, I yt have to figure
out what happens if you end up trying to define more than one type per
irq, I suppose it's not possible but I don't have the code at hand right
now)

> desc_handle_irq() does the locking and the calling of the highlevel irq 
> handler of the secondary PIC. [which then processes the device IRQ 
> handler actions and does any pre/post ACKing logic] There is no impact 
> to non-cascading PIC designs in this model either.

Except for one more level of function pointer indirection which can be
fairly expensive.

> Another, conceptual level problem is that (ab-)using the irq handler 
> methods to return an actual interrupt number is a layering violation. 
> There is not much in common between an IRQ demultiplexer function and an 
> interrupt handler function. The act of cascading two PICs _inevitably_ 
> means that there is a 1:N relationship between the two PICs, while an 
> IRQ handler is normally a 1:1 relationship between a hardware device and 
> a PIC. (there are exceptions like shared interrupts, but the norm we are 
> designing for is a 1:1 relationship) There are many other fundamental 
> differences too. Thus in our genirq work we have separated these two 
> concepts.

This is not completely true. It's actually fairly common that a device
itself is a demuxer. It's interrupt handler then reads a status register
to check which of the many local sources triggered. I don't think there
is any fundamental difference between a cascaded controller and a
device.

> In terms of patch merging (unless there are some arguments i've 
> overlooked), due to the reasons above i'm against merging SA_CASCADEIRQ, 
> even if it's relatively simple. It does not solve the demultiplexing 
> problem for most of the ARM handlers (nor for the PPC example i cited), 
> hence a separate variant has to be implemented anyway which results in 
> unnecessary code and concept duplication.

I think it does solve the problem due to my above explanations :)

> The current PPC/powerpc approach of calling __do_IRQ() might be hacky, 
> but it's functional, so neither is there any instant urgency AFAICS. If 
> our more complete genirq approach is rejected for whatever reason then 
> the SA_CASCADEIRQ patch can still be revisited as a secondary choice.
> 
> Nor can i see any big cleanup effect in terms of per-arch PIC code, the 
> patch actually adds a bit of code:

Not really, it will go away once I remove the special code to handle
cascades at the PIC level that we currently have and that will be made
obsolete by this mecanism.

In addtition, there are other issues imho with the implementation I've
seen in Thomas patch. Since I finally explained most of my concerns
above instead of in a couple of days, I'll finish now which whatever
remains in my mind of reading it yesterday :) That is essentially that i
find the approach of splitting the level and edge handlers less robust.

The current handler has the IN_PROGRESS/PENDING mecanism. The patch
keeps that only for edge interrupts and not for level interrupts. That
might seem fine on the paper, but as soon as an irq controller starts
misbehaving for some reason, the security it provides to avoid
re-entrency in a pending interrupt will be lost for level interrupts.

I think I had a couple of other concerns but I don't have them on the
top of my mind right now. Let's see how we go with that now tough. I've
addressed my main issues I think.

Ben.


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2006-05-11 11:49 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-05-11  5:26 [RFC][PATCH] Cascaded interrupts: a simple solution Benjamin Herrenschmidt
2006-05-11  5:41 ` [PATCH] PowerMac use of SA_CASCADEIRQ Benjamin Herrenschmidt
2006-05-11  6:13 ` [RFC][PATCH] Cascaded interrupts: a simple solution Benjamin Herrenschmidt
2006-05-11  9:37 ` Ingo Molnar
2006-05-11 11:47   ` Benjamin Herrenschmidt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).