All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH/RFC] rtc: rtc-twl: Fixed nested IRQ handling in resume from suspend
@ 2014-09-13 11:19 Laurent Pinchart
  2014-09-13 19:12 ` Thomas Gleixner
  2014-09-17 21:57 ` Thomas Gleixner
  0 siblings, 2 replies; 6+ messages in thread
From: Laurent Pinchart @ 2014-09-13 11:19 UTC (permalink / raw)
  To: linux-kernel
  Cc: rtc-linux, Rafael J. Wysocki, Thomas Gleixner, laurent.pinchart

The TWL RTC interrupt is a double-nested threaded interrupt, handled
through the TWL SIH (Secondary Interrupt Handler) and PIH (Primary
Interrupt Handler).

When the system is woken up from suspend by a TWL RTC alarm interrupt,
the TWL PIH and SIH are enabled first (due to the normal IRQ enabling
sequence for the PIH and to the IRQF_EARLY_RESUME flag for the SIH)
before the TWL RTC interrupt gets enabled. This results on the interrupt
being processed by the TWL primary interrupt handler, forwarded to the
nested SIH, and then marked as pending for the RTC by handle_nested_irq
called from the SIH.

The RTC interrupt then eventually gets reenabled the kernel, which will
try to call its top half interrupt handler. As the interrupt is a nested
threaded IRQ, its primary handler has been set to the
irq_nested_primary_handler function, which is never supposed to be
called and generates a WARN_ON, without waking the IRQ thread up.

Fix this by setting the IRQF_EARLY_RESUME for the TWL RTC interrupt to
ensure it gets enabled before the parent handlers try to process it.

This is likely a bit of a hack, I have a feeling that a more generic
solution that would fix the problem for all nested threaded IRQs enabled
as a wake up source by enable_irq_wake would be better.

I've noticed that Rafael's "PM / sleep / irq: Do not suspend wakeup
interrupts" patch (which has been reverted) might be related, but it
resulted in an endless stream of I2C errors in the kernel log at resume
time:

[  189.730682] twl4030: I2C error -13 reading PIH ISR
[  189.730712] twl: Read failed (mod 1, reg 0x01 count 1)

Given that I'm not that familiar with the IRQ subsystem I would
appreciate some guidance on how to properly solve the problem.

Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
---
 drivers/rtc/rtc-twl.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/rtc/rtc-twl.c b/drivers/rtc/rtc-twl.c
index 1915464..3155af1 100644
--- a/drivers/rtc/rtc-twl.c
+++ b/drivers/rtc/rtc-twl.c
@@ -536,7 +536,8 @@ static int twl_rtc_probe(struct platform_device *pdev)
 
 	ret = devm_request_threaded_irq(&pdev->dev, irq, NULL,
 					twl_rtc_interrupt,
-					IRQF_TRIGGER_RISING | IRQF_ONESHOT,
+					IRQF_TRIGGER_RISING | IRQF_ONESHOT |
+					IRQF_EARLY_RESUME,
 					dev_name(&rtc->dev), rtc);
 	if (ret < 0) {
 		dev_err(&pdev->dev, "IRQ is not free.\n");
-- 
Regards,

Laurent Pinchart


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH/RFC] rtc: rtc-twl: Fixed nested IRQ handling in resume from suspend
  2014-09-13 11:19 [PATCH/RFC] rtc: rtc-twl: Fixed nested IRQ handling in resume from suspend Laurent Pinchart
@ 2014-09-13 19:12 ` Thomas Gleixner
  2014-09-14 23:41   ` Laurent Pinchart
  2014-09-17 21:57 ` Thomas Gleixner
  1 sibling, 1 reply; 6+ messages in thread
From: Thomas Gleixner @ 2014-09-13 19:12 UTC (permalink / raw)
  To: Laurent Pinchart; +Cc: linux-kernel, rtc-linux, Rafael J. Wysocki

On Sat, 13 Sep 2014, Laurent Pinchart wrote:

> The TWL RTC interrupt is a double-nested threaded interrupt, handled
> through the TWL SIH (Secondary Interrupt Handler) and PIH (Primary
> Interrupt Handler).
> 
> When the system is woken up from suspend by a TWL RTC alarm interrupt,
> the TWL PIH and SIH are enabled first (due to the normal IRQ enabling
> sequence for the PIH and to the IRQF_EARLY_RESUME flag for the SIH)
> before the TWL RTC interrupt gets enabled. This results on the interrupt
> being processed by the TWL primary interrupt handler, forwarded to the
> nested SIH, and then marked as pending for the RTC by handle_nested_irq
> called from the SIH.
> 
> The RTC interrupt then eventually gets reenabled the kernel, which will
> try to call its top half interrupt handler. As the interrupt is a nested
> threaded IRQ, its primary handler has been set to the
> irq_nested_primary_handler function, which is never supposed to be
> called and generates a WARN_ON, without waking the IRQ thread up.
> 
> Fix this by setting the IRQF_EARLY_RESUME for the TWL RTC interrupt to
> ensure it gets enabled before the parent handlers try to process it.
> 
> This is likely a bit of a hack, I have a feeling that a more generic
> solution that would fix the problem for all nested threaded IRQs enabled
> as a wake up source by enable_irq_wake would be better.

Indeed. It's a hack. This is not the first abuse of IRQF_EARLY_RESUME
which is used to "fix" ordering issues with nested thread handlers.
 
I haven't come around yet to analyze the issue and come up with a
proper core side mechanism to handle that case. I put it on the "look
at it while trapped in a tin can" list.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH/RFC] rtc: rtc-twl: Fixed nested IRQ handling in resume from suspend
  2014-09-13 19:12 ` Thomas Gleixner
@ 2014-09-14 23:41   ` Laurent Pinchart
  0 siblings, 0 replies; 6+ messages in thread
From: Laurent Pinchart @ 2014-09-14 23:41 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: linux-kernel, rtc-linux, Rafael J. Wysocki

Hi Thomas,

On Saturday 13 September 2014 21:12:16 Thomas Gleixner wrote:
> On Sat, 13 Sep 2014, Laurent Pinchart wrote:
> > The TWL RTC interrupt is a double-nested threaded interrupt, handled
> > through the TWL SIH (Secondary Interrupt Handler) and PIH (Primary
> > Interrupt Handler).
> > 
> > When the system is woken up from suspend by a TWL RTC alarm interrupt,
> > the TWL PIH and SIH are enabled first (due to the normal IRQ enabling
> > sequence for the PIH and to the IRQF_EARLY_RESUME flag for the SIH)
> > before the TWL RTC interrupt gets enabled. This results on the interrupt
> > being processed by the TWL primary interrupt handler, forwarded to the
> > nested SIH, and then marked as pending for the RTC by handle_nested_irq
> > called from the SIH.
> > 
> > The RTC interrupt then eventually gets reenabled the kernel, which will
> > try to call its top half interrupt handler. As the interrupt is a nested
> > threaded IRQ, its primary handler has been set to the
> > irq_nested_primary_handler function, which is never supposed to be
> > called and generates a WARN_ON, without waking the IRQ thread up.
> > 
> > Fix this by setting the IRQF_EARLY_RESUME for the TWL RTC interrupt to
> > ensure it gets enabled before the parent handlers try to process it.
> > 
> > This is likely a bit of a hack, I have a feeling that a more generic
> > solution that would fix the problem for all nested threaded IRQs enabled
> > as a wake up source by enable_irq_wake would be better.
> 
> Indeed. It's a hack. This is not the first abuse of IRQF_EARLY_RESUME
> which is used to "fix" ordering issues with nested thread handlers.
> 
> I haven't come around yet to analyze the issue and come up with a proper
> core side mechanism to handle that case. I put it on the "look at it while
> trapped in a tin can" list.

Should this patch be applied in the meantime, or do you think you will be 
trapped in a tin can in the not too distant future ?

-- 
Regards,

Laurent Pinchart


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH/RFC] rtc: rtc-twl: Fixed nested IRQ handling in resume from suspend
  2014-09-13 11:19 [PATCH/RFC] rtc: rtc-twl: Fixed nested IRQ handling in resume from suspend Laurent Pinchart
  2014-09-13 19:12 ` Thomas Gleixner
@ 2014-09-17 21:57 ` Thomas Gleixner
  2014-09-21  1:04   ` Rafael J. Wysocki
  2015-02-26 13:05   ` Gerlando Falauto
  1 sibling, 2 replies; 6+ messages in thread
From: Thomas Gleixner @ 2014-09-17 21:57 UTC (permalink / raw)
  To: Laurent Pinchart; +Cc: linux-kernel, rtc-linux, Rafael J. Wysocki

On Sat, 13 Sep 2014, Laurent Pinchart wrote:

> The TWL RTC interrupt is a double-nested threaded interrupt, handled
> through the TWL SIH (Secondary Interrupt Handler) and PIH (Primary
> Interrupt Handler).
> 
> When the system is woken up from suspend by a TWL RTC alarm interrupt,
> the TWL PIH and SIH are enabled first (due to the normal IRQ enabling
> sequence for the PIH and to the IRQF_EARLY_RESUME flag for the SIH)
> before the TWL RTC interrupt gets enabled. This results on the interrupt
> being processed by the TWL primary interrupt handler, forwarded to the
> nested SIH, and then marked as pending for the RTC by handle_nested_irq
> called from the SIH.
> 
> The RTC interrupt then eventually gets reenabled the kernel, which will
> try to call its top half interrupt handler. As the interrupt is a nested
> threaded IRQ, its primary handler has been set to the
> irq_nested_primary_handler function, which is never supposed to be
> called and generates a WARN_ON, without waking the IRQ thread up.

Right. It CANNOT wake up the thread, because there is no thread
associated to that particular interrupt. It's handler is called in the
context of the parent (demultiplexing) interrupt thread. Of course
twl4030 does not call irq_set_parent() for the nested
interrupts. That's there so the resend of a nested thread irq will be
targeted to its parent.
 
Using IRQF_EARLY_RESUME is really, really wrong for device drivers
simply because at the point where early resume is called the devices
have not yet been resumed, so a interrupt delivered at this point
might run into a stale device and cause a machine stall or any other
undesired side effect. It was added for a special case with Xen where
Xen needs the IPIs working early in resume. And it's definitely not
meant to solve ordering issues of interrupts on resume.

That said, looking at that twl4030 driver, there seems to be a double
nesting issue. So also the simple one level parent redirection of the
irq resend wont work. I really wonder why this only triggers in the
context of resume.

Now the resend issue is simple to fix. The resume time ordering
constraints is a bit more involved, but it should be possible w/o
inflicting anything more complex on drivers than requiring them to use
irq_set_parent(), which should be name irq_set_nested_parent().

Completely untested patch below. It applies on top of

 git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git irq/pm

So what twl4030 needs on top of that are calls to
irq_set_nested_parent() for the nested interrupts.

That will automatically establish the nesting depth, redirect the
retrigger to the top most interrupt and solve the resume ordering.

The resume ordering is the reverse of the nesting:

    top-irq1 - nested irq10   - nested irq20 (parent = 10)
    	     | (parent = 1)   - nested irq21 (parent = 10)
	     |
    	     - nested irq11   - nested irq22 (parent = 11)	
	     | (parent = 1)   - nested irq23 (parent = 11)
	     |
	     - nested irq12   - nested irq24 (parent = 12)
	       (parent = 1)   - nested irq25 (parent = 12)

So the resume ordering is

   20-21-22-23-24-25 - 10-11-12 - 1

Thanks,

	tglx

---
 drivers/mfd/twl6030-irq.c |    3 --
 include/linux/irq.h       |    9 ------
 include/linux/irqdesc.h   |    5 +++
 kernel/irq/manage.c       |   28 ++++++++++++++++++---
 kernel/irq/pm.c           |   60 +++++++++++++++++++++++++++++++++++++++++-----
 kernel/irq/resend.c       |    4 +--
 kernel/irq/settings.h     |    5 +++
 7 files changed, 92 insertions(+), 22 deletions(-)

Index: tip/drivers/mfd/twl6030-irq.c
===================================================================
--- tip.orig/drivers/mfd/twl6030-irq.c
+++ tip/drivers/mfd/twl6030-irq.c
@@ -350,8 +350,7 @@ static int twl6030_irq_map(struct irq_do
 
 	irq_set_chip_data(virq, pdata);
 	irq_set_chip_and_handler(virq,  &pdata->irq_chip, handle_simple_irq);
-	irq_set_nested_thread(virq, true);
-	irq_set_parent(virq, pdata->twl_irq);
+	irq_set_nested_parent(virq, pdata->twl_irq);
 
 #ifdef CONFIG_ARM
 	/*
Index: tip/include/linux/irq.h
===================================================================
--- tip.orig/include/linux/irq.h
+++ tip/include/linux/irq.h
@@ -415,14 +415,7 @@ static inline void irq_move_masked_irq(s
 
 extern int no_irq_affinity;
 
-#ifdef CONFIG_HARDIRQS_SW_RESEND
-int irq_set_parent(int irq, int parent_irq);
-#else
-static inline int irq_set_parent(int irq, int parent_irq)
-{
-	return 0;
-}
-#endif
+int irq_set_nested_parent(int irq, int parent_irq);
 
 /*
  * Built-in IRQ handlers for various IRQ types,
Index: tip/include/linux/irqdesc.h
===================================================================
--- tip.orig/include/linux/irqdesc.h
+++ tip/include/linux/irqdesc.h
@@ -41,6 +41,9 @@ struct irq_desc;
  *			IRQF_NO_SUSPEND set
  * @force_resume_depth:	number of irqactions on a irq descriptor with
  *			IRQF_FORCE_RESUME set
+ * @parent_irq:		Parent interrupt in case of nested threads
+ * @parent_top:		Top parent interrupt in case of multiple nested threads
+ * @parent_depth:	Nest level of multiple nested threads for resume ordering
  * @dir:		/proc/irq/ procfs entry
  * @name:		flow handler name for /proc/interrupts output
  */
@@ -82,6 +85,8 @@ struct irq_desc {
 	struct proc_dir_entry	*dir;
 #endif
 	int			parent_irq;
+	int			parent_depth;
+	int			parent_top;
 	struct module		*owner;
 	const char		*name;
 } ____cacheline_internodealigned_in_smp;
Index: tip/kernel/irq/manage.c
===================================================================
--- tip.orig/kernel/irq/manage.c
+++ tip/kernel/irq/manage.c
@@ -624,21 +624,41 @@ int __irq_set_trigger(struct irq_desc *d
 	return ret;
 }
 
-#ifdef CONFIG_HARDIRQS_SW_RESEND
-int irq_set_parent(int irq, int parent_irq)
+int irq_set_nested_parent(int irq, int parent_irq)
 {
+	struct irq_desc *desc;
 	unsigned long flags;
-	struct irq_desc *desc = irq_get_desc_lock(irq, &flags, 0);
+	int parent_depth, parent_top;
 
+	/*
+	 * Get the parent irq first to retrieve the parent depth and
+	 * the top level parent irq number. The depth is required for
+	 * resume ordering, the top level irq number for software
+	 * resend.
+	 */
+	desc = irq_get_desc_lock(parent_irq, &flags, 0);
+	if (!desc)
+		return -EINVAL;
+	parent_depth = desc->parent_depth;
+	parent_top = desc->parent_top;
+	irq_put_desc_unlock(desc, flags);
+
+	/* Setup the info in the child */
+	desc = irq_get_desc_lock(parent_irq, &flags, 0);
 	if (!desc)
 		return -EINVAL;
 
 	desc->parent_irq = parent_irq;
+	desc->parent_top = parent_top ? parent_top : parent_irq;
+	desc->parent_depth = parent_depth + 1;
+
+	/* Set the nested thread bit as well */
+	irq_settings_set_nested_thread(desc);
 
 	irq_put_desc_unlock(desc, flags);
+
 	return 0;
 }
-#endif
 
 /*
  * Default primary interrupt handler for threaded interrupts. Is
Index: tip/kernel/irq/pm.c
===================================================================
--- tip.orig/kernel/irq/pm.c
+++ tip/kernel/irq/pm.c
@@ -8,6 +8,7 @@
 
 #include <linux/irq.h>
 #include <linux/module.h>
+#include <linux/bitmap.h>
 #include <linux/interrupt.h>
 #include <linux/suspend.h>
 #include <linux/syscore_ops.h>
@@ -93,6 +94,37 @@ static bool suspend_device_irq(struct ir
 	return true;
 }
 
+#define IRQ_NEST_DEPTH 4
+
+struct resume_level {
+	unsigned long	map[BITS_TO_LONGS(IRQ_BITMAP_BITS)];
+	int		top;
+};
+
+static struct resume_level resume_order_lvls[IRQ_NEST_DEPTH];
+static struct resume_level resume_early_lvl;
+static int resume_order_max_depth;
+
+static void update_resume_order(struct irq_desc *desc, int irq)
+{
+	bool early = desc->action && desc->action->flags & IRQF_EARLY_RESUME;
+	int depth = desc->parent_depth;
+	struct resume_level *l;
+
+	if (!early) {
+		if (WARN_ON_ONCE(depth >= IRQ_NEST_DEPTH))
+			depth = IRQ_NEST_DEPTH - 1;
+		l = resume_order_lvls + depth;
+		if (depth > resume_order_max_depth)
+			resume_order_max_depth = depth;
+	} else {
+		WARN_ON_ONCE(depth != 0);
+		l = &resume_early_lvl;
+	}
+	set_bit(irq, l->map);
+	l->top = irq;
+}
+
 /**
  * suspend_device_irqs - disable all currently enabled interrupt lines
  *
@@ -114,11 +146,16 @@ void suspend_device_irqs(void)
 	struct irq_desc *desc;
 	int irq;
 
+	memset(resume_order_lvls, 0, sizeof(resume_order_lvls));
+	memset(resume_early_lvl, 0, sizeof(resume_early_lvl));
+	resume_order_max_depth = 0;
+
 	for_each_irq_desc(irq, desc) {
 		unsigned long flags;
 		bool sync;
 
 		raw_spin_lock_irqsave(&desc->lock, flags);
+		update_resume_order(desc, irq);
 		sync = suspend_device_irq(desc, irq);
 		raw_spin_unlock_irqrestore(&desc->lock, flags);
 
@@ -146,17 +183,16 @@ resume:
 	__enable_irq(desc, irq);
 }
 
-static void resume_irqs(bool want_early)
+static void resume_irq_lvl(struct resume_level *l)
 {
 	struct irq_desc *desc;
+	unsigned long flags;
 	int irq;
 
-	for_each_irq_desc(irq, desc) {
-		unsigned long flags;
-		bool is_early = desc->action &&
-			desc->action->flags & IRQF_EARLY_RESUME;
+	for_each_set_bit(irq, l->map, l->top + 1) {
 
-		if (!is_early && want_early)
+		desc = irq_to_desc(irq);
+		if (WARN_ON_ONCE(!desc))
 			continue;
 
 		raw_spin_lock_irqsave(&desc->lock, flags);
@@ -165,6 +201,18 @@ static void resume_irqs(bool want_early)
 	}
 }
 
+static void resume_irqs(bool early)
+{
+	int i;
+
+	if (early) {
+		resume_irq_lvl(&resume_early_lvl);
+	} else {
+		for (i = resume_order_max_depth; i >= 0; i--)
+			resume_irq_lvl(resume_order_lvls + i);
+	}
+}
+
 /**
  * irq_pm_syscore_ops - enable interrupt lines early
  *
Index: tip/kernel/irq/resend.c
===================================================================
--- tip.orig/kernel/irq/resend.c
+++ tip/kernel/irq/resend.c
@@ -79,9 +79,9 @@ void check_irq_resend(struct irq_desc *d
 			 * in the thread context of the parent irq,
 			 * retrigger the parent.
 			 */
-			if (desc->parent_irq &&
+			if (desc->parent_top &&
 			    irq_settings_is_nested_thread(desc))
-				irq = desc->parent_irq;
+				irq = desc->parent_top;
 			/* Set it pending and activate the softirq: */
 			set_bit(irq, irqs_resend);
 			tasklet_schedule(&resend_tasklet);
Index: tip/kernel/irq/settings.h
===================================================================
--- tip.orig/kernel/irq/settings.h
+++ tip/kernel/irq/settings.h
@@ -53,6 +53,11 @@ static inline void irq_settings_set_per_
 	desc->status_use_accessors |= _IRQ_PER_CPU;
 }
 
+static inline void irq_settings_set_nested_thread(struct irq_desc *desc)
+{
+	desc->status_use_accessors |= _IRQ_NESTED_THREAD;
+}
+
 static inline void irq_settings_set_no_balancing(struct irq_desc *desc)
 {
 	desc->status_use_accessors |= _IRQ_NO_BALANCING;




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH/RFC] rtc: rtc-twl: Fixed nested IRQ handling in resume from suspend
  2014-09-17 21:57 ` Thomas Gleixner
@ 2014-09-21  1:04   ` Rafael J. Wysocki
  2015-02-26 13:05   ` Gerlando Falauto
  1 sibling, 0 replies; 6+ messages in thread
From: Rafael J. Wysocki @ 2014-09-21  1:04 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Laurent Pinchart, linux-kernel, rtc-linux

On Wednesday, September 17, 2014 02:57:11 PM Thomas Gleixner wrote:
> On Sat, 13 Sep 2014, Laurent Pinchart wrote:
> 
> > The TWL RTC interrupt is a double-nested threaded interrupt, handled
> > through the TWL SIH (Secondary Interrupt Handler) and PIH (Primary
> > Interrupt Handler).
> > 
> > When the system is woken up from suspend by a TWL RTC alarm interrupt,
> > the TWL PIH and SIH are enabled first (due to the normal IRQ enabling
> > sequence for the PIH and to the IRQF_EARLY_RESUME flag for the SIH)
> > before the TWL RTC interrupt gets enabled. This results on the interrupt
> > being processed by the TWL primary interrupt handler, forwarded to the
> > nested SIH, and then marked as pending for the RTC by handle_nested_irq
> > called from the SIH.
> > 
> > The RTC interrupt then eventually gets reenabled the kernel, which will
> > try to call its top half interrupt handler. As the interrupt is a nested
> > threaded IRQ, its primary handler has been set to the
> > irq_nested_primary_handler function, which is never supposed to be
> > called and generates a WARN_ON, without waking the IRQ thread up.
> 
> Right. It CANNOT wake up the thread, because there is no thread
> associated to that particular interrupt. It's handler is called in the
> context of the parent (demultiplexing) interrupt thread. Of course
> twl4030 does not call irq_set_parent() for the nested
> interrupts. That's there so the resend of a nested thread irq will be
> targeted to its parent.
>  
> Using IRQF_EARLY_RESUME is really, really wrong for device drivers
> simply because at the point where early resume is called the devices
> have not yet been resumed, so a interrupt delivered at this point
> might run into a stale device and cause a machine stall or any other
> undesired side effect. It was added for a special case with Xen where
> Xen needs the IPIs working early in resume. And it's definitely not
> meant to solve ordering issues of interrupts on resume.
> 
> That said, looking at that twl4030 driver, there seems to be a double
> nesting issue. So also the simple one level parent redirection of the
> irq resend wont work. I really wonder why this only triggers in the
> context of resume.
> 
> Now the resend issue is simple to fix. The resume time ordering
> constraints is a bit more involved, but it should be possible w/o
> inflicting anything more complex on drivers than requiring them to use
> irq_set_parent(), which should be name irq_set_nested_parent().
> 
> Completely untested patch below. It applies on top of
> 
>  git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git irq/pm
> 
> So what twl4030 needs on top of that are calls to
> irq_set_nested_parent() for the nested interrupts.
> 
> That will automatically establish the nesting depth, redirect the
> retrigger to the top most interrupt and solve the resume ordering.
> 
> The resume ordering is the reverse of the nesting:
> 
>     top-irq1 - nested irq10   - nested irq20 (parent = 10)
>     	     | (parent = 1)   - nested irq21 (parent = 10)
> 	     |
>     	     - nested irq11   - nested irq22 (parent = 11)	
> 	     | (parent = 1)   - nested irq23 (parent = 11)
> 	     |
> 	     - nested irq12   - nested irq24 (parent = 12)
> 	       (parent = 1)   - nested irq25 (parent = 12)
> 
> So the resume ordering is
> 
>    20-21-22-23-24-25 - 10-11-12 - 1

I was thinking about this earlier today.

Can't we do something like this (pseudo code) during resume:

resume_irq(irq) {
	if (has parent_irq)
		resume_irq(parent_irq);

	do stuff;
}

which will get us the right ordering without using bitmaps and visiting
the same one twice (once from the child and once from the main resume loop)
doesn't matter?

Rafael


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH/RFC] rtc: rtc-twl: Fixed nested IRQ handling in resume from suspend
  2014-09-17 21:57 ` Thomas Gleixner
  2014-09-21  1:04   ` Rafael J. Wysocki
@ 2015-02-26 13:05   ` Gerlando Falauto
  1 sibling, 0 replies; 6+ messages in thread
From: Gerlando Falauto @ 2015-02-26 13:05 UTC (permalink / raw)
  To: Thomas Gleixner, Laurent Pinchart
  Cc: linux-kernel, rtc-linux, Rafael J. Wysocki

Hi Thomas,

did anything ever come out of this?

I'm encountering a similar but different problem, where a nested 
interrupt handler is called directly from the resend tasklet (and 
therefore -- if I'm not mistaken -- in an atomic context, which is 
unexpected). This issue however appears during system startup, when the 
interrupt is first enabled, and is very, very hard to reproduce. So I 
don't have much room for further investigation.

Do you have any clue what might be the sequence of events leading to 
this behavior?

While I reckon that using irq_set_parent() on the nested IRQ might as 
well fix the issue (or just as well hide it), I just don't understand 
how this all works. Plus, I wonder why it is (almost) never used... I 
only found two occurrences of irq_set_parent() on 3.19-rc5 (gpiolib.c 
and twl6030-irq.c), whereas exactly zero on 3.10.60.
Is there any known way to trigger/test interrupt resending (which, I 
believe, has been around for a very long time)?

Thank you!
Gerlando

On 09/17/2014 11:57 PM, Thomas Gleixner wrote:
> On Sat, 13 Sep 2014, Laurent Pinchart wrote:
>
>> The TWL RTC interrupt is a double-nested threaded interrupt, handled
>> through the TWL SIH (Secondary Interrupt Handler) and PIH (Primary
>> Interrupt Handler).
>>
>> When the system is woken up from suspend by a TWL RTC alarm interrupt,
>> the TWL PIH and SIH are enabled first (due to the normal IRQ enabling
>> sequence for the PIH and to the IRQF_EARLY_RESUME flag for the SIH)
>> before the TWL RTC interrupt gets enabled. This results on the interrupt
>> being processed by the TWL primary interrupt handler, forwarded to the
>> nested SIH, and then marked as pending for the RTC by handle_nested_irq
>> called from the SIH.
>>
>> The RTC interrupt then eventually gets reenabled the kernel, which will
>> try to call its top half interrupt handler. As the interrupt is a nested
>> threaded IRQ, its primary handler has been set to the
>> irq_nested_primary_handler function, which is never supposed to be
>> called and generates a WARN_ON, without waking the IRQ thread up.
>
> Right. It CANNOT wake up the thread, because there is no thread
> associated to that particular interrupt. It's handler is called in the
> context of the parent (demultiplexing) interrupt thread. Of course
> twl4030 does not call irq_set_parent() for the nested
> interrupts. That's there so the resend of a nested thread irq will be
> targeted to its parent.
>
> Using IRQF_EARLY_RESUME is really, really wrong for device drivers
> simply because at the point where early resume is called the devices
> have not yet been resumed, so a interrupt delivered at this point
> might run into a stale device and cause a machine stall or any other
> undesired side effect. It was added for a special case with Xen where
> Xen needs the IPIs working early in resume. And it's definitely not
> meant to solve ordering issues of interrupts on resume.
>
> That said, looking at that twl4030 driver, there seems to be a double
> nesting issue. So also the simple one level parent redirection of the
> irq resend wont work. I really wonder why this only triggers in the
> context of resume.
>
> Now the resend issue is simple to fix. The resume time ordering
> constraints is a bit more involved, but it should be possible w/o
> inflicting anything more complex on drivers than requiring them to use
> irq_set_parent(), which should be name irq_set_nested_parent().
>
> Completely untested patch below. It applies on top of
>
>   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git irq/pm
>
> So what twl4030 needs on top of that are calls to
> irq_set_nested_parent() for the nested interrupts.
>
> That will automatically establish the nesting depth, redirect the
> retrigger to the top most interrupt and solve the resume ordering.
>
> The resume ordering is the reverse of the nesting:
>
>      top-irq1 - nested irq10   - nested irq20 (parent = 10)
>      	     | (parent = 1)   - nested irq21 (parent = 10)
> 	     |
>      	     - nested irq11   - nested irq22 (parent = 11)	
> 	     | (parent = 1)   - nested irq23 (parent = 11)
> 	     |
> 	     - nested irq12   - nested irq24 (parent = 12)
> 	       (parent = 1)   - nested irq25 (parent = 12)
>
> So the resume ordering is
>
>     20-21-22-23-24-25 - 10-11-12 - 1
>
> Thanks,
>
> 	tglx
>
> ---
>   drivers/mfd/twl6030-irq.c |    3 --
>   include/linux/irq.h       |    9 ------
>   include/linux/irqdesc.h   |    5 +++
>   kernel/irq/manage.c       |   28 ++++++++++++++++++---
>   kernel/irq/pm.c           |   60 +++++++++++++++++++++++++++++++++++++++++-----
>   kernel/irq/resend.c       |    4 +--
>   kernel/irq/settings.h     |    5 +++
>   7 files changed, 92 insertions(+), 22 deletions(-)
>
> Index: tip/drivers/mfd/twl6030-irq.c
> ===================================================================
> --- tip.orig/drivers/mfd/twl6030-irq.c
> +++ tip/drivers/mfd/twl6030-irq.c
> @@ -350,8 +350,7 @@ static int twl6030_irq_map(struct irq_do
>
>   	irq_set_chip_data(virq, pdata);
>   	irq_set_chip_and_handler(virq,  &pdata->irq_chip, handle_simple_irq);
> -	irq_set_nested_thread(virq, true);
> -	irq_set_parent(virq, pdata->twl_irq);
> +	irq_set_nested_parent(virq, pdata->twl_irq);
>
>   #ifdef CONFIG_ARM
>   	/*
> Index: tip/include/linux/irq.h
> ===================================================================
> --- tip.orig/include/linux/irq.h
> +++ tip/include/linux/irq.h
> @@ -415,14 +415,7 @@ static inline void irq_move_masked_irq(s
>
>   extern int no_irq_affinity;
>
> -#ifdef CONFIG_HARDIRQS_SW_RESEND
> -int irq_set_parent(int irq, int parent_irq);
> -#else
> -static inline int irq_set_parent(int irq, int parent_irq)
> -{
> -	return 0;
> -}
> -#endif
> +int irq_set_nested_parent(int irq, int parent_irq);
>
>   /*
>    * Built-in IRQ handlers for various IRQ types,
> Index: tip/include/linux/irqdesc.h
> ===================================================================
> --- tip.orig/include/linux/irqdesc.h
> +++ tip/include/linux/irqdesc.h
> @@ -41,6 +41,9 @@ struct irq_desc;
>    *			IRQF_NO_SUSPEND set
>    * @force_resume_depth:	number of irqactions on a irq descriptor with
>    *			IRQF_FORCE_RESUME set
> + * @parent_irq:		Parent interrupt in case of nested threads
> + * @parent_top:		Top parent interrupt in case of multiple nested threads
> + * @parent_depth:	Nest level of multiple nested threads for resume ordering
>    * @dir:		/proc/irq/ procfs entry
>    * @name:		flow handler name for /proc/interrupts output
>    */
> @@ -82,6 +85,8 @@ struct irq_desc {
>   	struct proc_dir_entry	*dir;
>   #endif
>   	int			parent_irq;
> +	int			parent_depth;
> +	int			parent_top;
>   	struct module		*owner;
>   	const char		*name;
>   } ____cacheline_internodealigned_in_smp;
> Index: tip/kernel/irq/manage.c
> ===================================================================
> --- tip.orig/kernel/irq/manage.c
> +++ tip/kernel/irq/manage.c
> @@ -624,21 +624,41 @@ int __irq_set_trigger(struct irq_desc *d
>   	return ret;
>   }
>
> -#ifdef CONFIG_HARDIRQS_SW_RESEND
> -int irq_set_parent(int irq, int parent_irq)
> +int irq_set_nested_parent(int irq, int parent_irq)
>   {
> +	struct irq_desc *desc;
>   	unsigned long flags;
> -	struct irq_desc *desc = irq_get_desc_lock(irq, &flags, 0);
> +	int parent_depth, parent_top;
>
> +	/*
> +	 * Get the parent irq first to retrieve the parent depth and
> +	 * the top level parent irq number. The depth is required for
> +	 * resume ordering, the top level irq number for software
> +	 * resend.
> +	 */
> +	desc = irq_get_desc_lock(parent_irq, &flags, 0);
> +	if (!desc)
> +		return -EINVAL;
> +	parent_depth = desc->parent_depth;
> +	parent_top = desc->parent_top;
> +	irq_put_desc_unlock(desc, flags);
> +
> +	/* Setup the info in the child */
> +	desc = irq_get_desc_lock(parent_irq, &flags, 0);
>   	if (!desc)
>   		return -EINVAL;
>
>   	desc->parent_irq = parent_irq;
> +	desc->parent_top = parent_top ? parent_top : parent_irq;
> +	desc->parent_depth = parent_depth + 1;
> +
> +	/* Set the nested thread bit as well */
> +	irq_settings_set_nested_thread(desc);
>
>   	irq_put_desc_unlock(desc, flags);
> +
>   	return 0;
>   }
> -#endif
>
>   /*
>    * Default primary interrupt handler for threaded interrupts. Is
> Index: tip/kernel/irq/pm.c
> ===================================================================
> --- tip.orig/kernel/irq/pm.c
> +++ tip/kernel/irq/pm.c
> @@ -8,6 +8,7 @@
>
>   #include <linux/irq.h>
>   #include <linux/module.h>
> +#include <linux/bitmap.h>
>   #include <linux/interrupt.h>
>   #include <linux/suspend.h>
>   #include <linux/syscore_ops.h>
> @@ -93,6 +94,37 @@ static bool suspend_device_irq(struct ir
>   	return true;
>   }
>
> +#define IRQ_NEST_DEPTH 4
> +
> +struct resume_level {
> +	unsigned long	map[BITS_TO_LONGS(IRQ_BITMAP_BITS)];
> +	int		top;
> +};
> +
> +static struct resume_level resume_order_lvls[IRQ_NEST_DEPTH];
> +static struct resume_level resume_early_lvl;
> +static int resume_order_max_depth;
> +
> +static void update_resume_order(struct irq_desc *desc, int irq)
> +{
> +	bool early = desc->action && desc->action->flags & IRQF_EARLY_RESUME;
> +	int depth = desc->parent_depth;
> +	struct resume_level *l;
> +
> +	if (!early) {
> +		if (WARN_ON_ONCE(depth >= IRQ_NEST_DEPTH))
> +			depth = IRQ_NEST_DEPTH - 1;
> +		l = resume_order_lvls + depth;
> +		if (depth > resume_order_max_depth)
> +			resume_order_max_depth = depth;
> +	} else {
> +		WARN_ON_ONCE(depth != 0);
> +		l = &resume_early_lvl;
> +	}
> +	set_bit(irq, l->map);
> +	l->top = irq;
> +}
> +
>   /**
>    * suspend_device_irqs - disable all currently enabled interrupt lines
>    *
> @@ -114,11 +146,16 @@ void suspend_device_irqs(void)
>   	struct irq_desc *desc;
>   	int irq;
>
> +	memset(resume_order_lvls, 0, sizeof(resume_order_lvls));
> +	memset(resume_early_lvl, 0, sizeof(resume_early_lvl));
> +	resume_order_max_depth = 0;
> +
>   	for_each_irq_desc(irq, desc) {
>   		unsigned long flags;
>   		bool sync;
>
>   		raw_spin_lock_irqsave(&desc->lock, flags);
> +		update_resume_order(desc, irq);
>   		sync = suspend_device_irq(desc, irq);
>   		raw_spin_unlock_irqrestore(&desc->lock, flags);
>
> @@ -146,17 +183,16 @@ resume:
>   	__enable_irq(desc, irq);
>   }
>
> -static void resume_irqs(bool want_early)
> +static void resume_irq_lvl(struct resume_level *l)
>   {
>   	struct irq_desc *desc;
> +	unsigned long flags;
>   	int irq;
>
> -	for_each_irq_desc(irq, desc) {
> -		unsigned long flags;
> -		bool is_early = desc->action &&
> -			desc->action->flags & IRQF_EARLY_RESUME;
> +	for_each_set_bit(irq, l->map, l->top + 1) {
>
> -		if (!is_early && want_early)
> +		desc = irq_to_desc(irq);
> +		if (WARN_ON_ONCE(!desc))
>   			continue;
>
>   		raw_spin_lock_irqsave(&desc->lock, flags);
> @@ -165,6 +201,18 @@ static void resume_irqs(bool want_early)
>   	}
>   }
>
> +static void resume_irqs(bool early)
> +{
> +	int i;
> +
> +	if (early) {
> +		resume_irq_lvl(&resume_early_lvl);
> +	} else {
> +		for (i = resume_order_max_depth; i >= 0; i--)
> +			resume_irq_lvl(resume_order_lvls + i);
> +	}
> +}
> +
>   /**
>    * irq_pm_syscore_ops - enable interrupt lines early
>    *
> Index: tip/kernel/irq/resend.c
> ===================================================================
> --- tip.orig/kernel/irq/resend.c
> +++ tip/kernel/irq/resend.c
> @@ -79,9 +79,9 @@ void check_irq_resend(struct irq_desc *d
>   			 * in the thread context of the parent irq,
>   			 * retrigger the parent.
>   			 */
> -			if (desc->parent_irq &&
> +			if (desc->parent_top &&
>   			    irq_settings_is_nested_thread(desc))
> -				irq = desc->parent_irq;
> +				irq = desc->parent_top;
>   			/* Set it pending and activate the softirq: */
>   			set_bit(irq, irqs_resend);
>   			tasklet_schedule(&resend_tasklet);
> Index: tip/kernel/irq/settings.h
> ===================================================================
> --- tip.orig/kernel/irq/settings.h
> +++ tip/kernel/irq/settings.h
> @@ -53,6 +53,11 @@ static inline void irq_settings_set_per_
>   	desc->status_use_accessors |= _IRQ_PER_CPU;
>   }
>
> +static inline void irq_settings_set_nested_thread(struct irq_desc *desc)
> +{
> +	desc->status_use_accessors |= _IRQ_NESTED_THREAD;
> +}
> +
>   static inline void irq_settings_set_no_balancing(struct irq_desc *desc)
>   {
>   	desc->status_use_accessors |= _IRQ_NO_BALANCING;
>
>
>


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2015-02-26 13:12 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-13 11:19 [PATCH/RFC] rtc: rtc-twl: Fixed nested IRQ handling in resume from suspend Laurent Pinchart
2014-09-13 19:12 ` Thomas Gleixner
2014-09-14 23:41   ` Laurent Pinchart
2014-09-17 21:57 ` Thomas Gleixner
2014-09-21  1:04   ` Rafael J. Wysocki
2015-02-26 13:05   ` Gerlando Falauto

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.