linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] pinctrl: mediatek: Fix 2 issues related to resume from wake sources.
@ 2019-04-29  3:55 Nicolas Boichat
  2019-04-29  3:55 ` [PATCH 1/2] pinctrl: mediatek: Ignore interrupts that are wake only during resume Nicolas Boichat
  2019-04-29  3:55 ` [PATCH 2/2] pinctrl: mediatek: Update cur_mask in mask/mask ops Nicolas Boichat
  0 siblings, 2 replies; 18+ messages in thread
From: Nicolas Boichat @ 2019-04-29  3:55 UTC (permalink / raw)
  To: linux-mediatek
  Cc: Sean Wang, Linus Walleij, Matthias Brugger, linux-gpio,
	linux-arm-kernel, linux-kernel, Chuanjia Liu, evgreen, swboyd

This fixes 2 issues when resuming from a wake source, especially if these
wake sources are level-sensitive.

Tested on mt8183 with the series in https://patchwork.kernel.org/cover/10921121/,
but this should affect all mediatek platforms.

Nicolas Boichat (2):
  pinctrl: mediatek: Ignore interrupts that are wake only during resume
  pinctrl: mediatek: Update cur_mask in mask/mask ops

 drivers/pinctrl/mediatek/mtk-eint.c | 34 ++++++++++++++++-------------
 1 file changed, 19 insertions(+), 15 deletions(-)

-- 
2.21.0.593.g511ec345e18-goog


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/2] pinctrl: mediatek: Ignore interrupts that are wake only during resume
  2019-04-29  3:55 [PATCH 0/2] pinctrl: mediatek: Fix 2 issues related to resume from wake sources Nicolas Boichat
@ 2019-04-29  3:55 ` Nicolas Boichat
  2019-06-21  4:47   ` Sean Wang
  2019-06-25 13:50   ` Linus Walleij
  2019-04-29  3:55 ` [PATCH 2/2] pinctrl: mediatek: Update cur_mask in mask/mask ops Nicolas Boichat
  1 sibling, 2 replies; 18+ messages in thread
From: Nicolas Boichat @ 2019-04-29  3:55 UTC (permalink / raw)
  To: linux-mediatek
  Cc: Sean Wang, Linus Walleij, Matthias Brugger, linux-gpio,
	linux-arm-kernel, linux-kernel, Chuanjia Liu, evgreen, swboyd

Before suspending, mtk-eint would set the interrupt mask to the
one in wake_mask. However, some of these interrupts may not have a
corresponding interrupt handler, or the interrupt may be disabled.

On resume, the eint irq handler would trigger nevertheless,
and irq/pm.c:irq_pm_check_wakeup would be called, which would
try to call irq_disable. However, if the interrupt is not enabled
(irqd_irq_disabled(&desc->irq_data) is true), the call does nothing,
and the interrupt is left enabled in the eint driver.

Especially for level-sensitive interrupts, this will lead to an
interrupt storm on resume.

If we detect that an interrupt is only in wake_mask, but not in
cur_mask, we can just mask it out immediately (as mtk_eint_resume
would do anyway at a later stage in the resume sequence, when
restoring cur_mask).

Fixes: bf22ff45bed ("genirq: Avoid unnecessary low level irq function calls")
Signed-off-by: Nicolas Boichat <drinkcat@chromium.org>
---
 drivers/pinctrl/mediatek/mtk-eint.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/drivers/pinctrl/mediatek/mtk-eint.c b/drivers/pinctrl/mediatek/mtk-eint.c
index f464f8cd274b75c..737385e86beb807 100644
--- a/drivers/pinctrl/mediatek/mtk-eint.c
+++ b/drivers/pinctrl/mediatek/mtk-eint.c
@@ -318,7 +318,7 @@ static void mtk_eint_irq_handler(struct irq_desc *desc)
 	struct irq_chip *chip = irq_desc_get_chip(desc);
 	struct mtk_eint *eint = irq_desc_get_handler_data(desc);
 	unsigned int status, eint_num;
-	int offset, index, virq;
+	int offset, mask_offset, index, virq;
 	void __iomem *reg =  mtk_eint_get_offset(eint, 0, eint->regs->stat);
 	int dual_edge, start_level, curr_level;
 
@@ -328,10 +328,24 @@ static void mtk_eint_irq_handler(struct irq_desc *desc)
 		status = readl(reg);
 		while (status) {
 			offset = __ffs(status);
+			mask_offset = eint_num >> 5;
 			index = eint_num + offset;
 			virq = irq_find_mapping(eint->domain, index);
 			status &= ~BIT(offset);
 
+			/*
+			 * If we get an interrupt on pin that was only required
+			 * for wake (but no real interrupt requested), mask the
+			 * interrupt (as would mtk_eint_resume do anyway later
+			 * in the resume sequence).
+			 */
+			if (eint->wake_mask[mask_offset] & BIT(offset) &&
+			    !(eint->cur_mask[mask_offset] & BIT(offset))) {
+				writel_relaxed(BIT(offset), reg -
+					eint->regs->stat +
+					eint->regs->mask_set);
+			}
+
 			dual_edge = eint->dual_edge[index];
 			if (dual_edge) {
 				/*
-- 
2.21.0.593.g511ec345e18-goog


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/2] pinctrl: mediatek: Update cur_mask in mask/mask ops
  2019-04-29  3:55 [PATCH 0/2] pinctrl: mediatek: Fix 2 issues related to resume from wake sources Nicolas Boichat
  2019-04-29  3:55 ` [PATCH 1/2] pinctrl: mediatek: Ignore interrupts that are wake only during resume Nicolas Boichat
@ 2019-04-29  3:55 ` Nicolas Boichat
  2019-05-13 22:29   ` Stephen Boyd
  2019-06-21  4:20   ` Sean Wang
  1 sibling, 2 replies; 18+ messages in thread
From: Nicolas Boichat @ 2019-04-29  3:55 UTC (permalink / raw)
  To: linux-mediatek
  Cc: Sean Wang, Linus Walleij, Matthias Brugger, linux-gpio,
	linux-arm-kernel, linux-kernel, Chuanjia Liu, evgreen, swboyd

During suspend/resume, mtk_eint_mask may be called while
wake_mask is active. For example, this happens if a wake-source
with an active interrupt handler wakes the system:
irq/pm.c:irq_pm_check_wakeup would disable the interrupt, so
that it can be handled later on in the resume flow.

However, this may happen before mtk_eint_do_resume is called:
in this case, wake_mask is loaded, and cur_mask is restored
from an older copy, re-enabling the interrupt, and causing
an interrupt storm (especially for level interrupts).

Instead, we just record mask/unmask changes in cur_mask. This
also avoids the need to read the current mask in eint_do_suspend,
and we can remove mtk_eint_chip_read_mask function.

Signed-off-by: Nicolas Boichat <drinkcat@chromium.org>
---
 drivers/pinctrl/mediatek/mtk-eint.c | 18 ++++--------------
 1 file changed, 4 insertions(+), 14 deletions(-)

diff --git a/drivers/pinctrl/mediatek/mtk-eint.c b/drivers/pinctrl/mediatek/mtk-eint.c
index 737385e86beb807..7e526bcf5e0b55c 100644
--- a/drivers/pinctrl/mediatek/mtk-eint.c
+++ b/drivers/pinctrl/mediatek/mtk-eint.c
@@ -113,6 +113,8 @@ static void mtk_eint_mask(struct irq_data *d)
 	void __iomem *reg = mtk_eint_get_offset(eint, d->hwirq,
 						eint->regs->mask_set);
 
+	eint->cur_mask[d->hwirq >> 5] &= ~mask;
+
 	writel(mask, reg);
 }
 
@@ -123,6 +125,8 @@ static void mtk_eint_unmask(struct irq_data *d)
 	void __iomem *reg = mtk_eint_get_offset(eint, d->hwirq,
 						eint->regs->mask_clr);
 
+	eint->cur_mask[d->hwirq >> 5] |= mask;
+
 	writel(mask, reg);
 
 	if (eint->dual_edge[d->hwirq])
@@ -217,19 +221,6 @@ static void mtk_eint_chip_write_mask(const struct mtk_eint *eint,
 	}
 }
 
-static void mtk_eint_chip_read_mask(const struct mtk_eint *eint,
-				    void __iomem *base, u32 *buf)
-{
-	int port;
-	void __iomem *reg;
-
-	for (port = 0; port < eint->hw->ports; port++) {
-		reg = base + eint->regs->mask + (port << 2);
-		buf[port] = ~readl_relaxed(reg);
-		/* Mask is 0 when irq is enabled, and 1 when disabled. */
-	}
-}
-
 static int mtk_eint_irq_request_resources(struct irq_data *d)
 {
 	struct mtk_eint *eint = irq_data_get_irq_chip_data(d);
@@ -384,7 +375,6 @@ static void mtk_eint_irq_handler(struct irq_desc *desc)
 
 int mtk_eint_do_suspend(struct mtk_eint *eint)
 {
-	mtk_eint_chip_read_mask(eint, eint->base, eint->cur_mask);
 	mtk_eint_chip_write_mask(eint, eint->base, eint->wake_mask);
 
 	return 0;
-- 
2.21.0.593.g511ec345e18-goog


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] pinctrl: mediatek: Update cur_mask in mask/mask ops
  2019-04-29  3:55 ` [PATCH 2/2] pinctrl: mediatek: Update cur_mask in mask/mask ops Nicolas Boichat
@ 2019-05-13 22:29   ` Stephen Boyd
  2019-05-14  1:37     ` Nicolas Boichat
  2019-06-21  4:20   ` Sean Wang
  1 sibling, 1 reply; 18+ messages in thread
From: Stephen Boyd @ 2019-05-13 22:29 UTC (permalink / raw)
  To: Nicolas Boichat, linux-mediatek
  Cc: Sean Wang, Linus Walleij, Matthias Brugger, linux-gpio,
	linux-arm-kernel, linux-kernel, Chuanjia Liu, evgreen

Quoting Nicolas Boichat (2019-04-28 20:55:15)
> During suspend/resume, mtk_eint_mask may be called while
> wake_mask is active. For example, this happens if a wake-source
> with an active interrupt handler wakes the system:
> irq/pm.c:irq_pm_check_wakeup would disable the interrupt, so
> that it can be handled later on in the resume flow.
> 
> However, this may happen before mtk_eint_do_resume is called:
> in this case, wake_mask is loaded, and cur_mask is restored
> from an older copy, re-enabling the interrupt, and causing
> an interrupt storm (especially for level interrupts).
> 
> Instead, we just record mask/unmask changes in cur_mask. This
> also avoids the need to read the current mask in eint_do_suspend,
> and we can remove mtk_eint_chip_read_mask function.
> 
> Signed-off-by: Nicolas Boichat <drinkcat@chromium.org>

It looks an awful lot like you should just use IRQCHIP_MASK_ON_SUSPEND
here. Isn't that what's happening? All non-wake irqs should be masked at
the hardware level so they can't cause a wakeup during suspend and on
resume they can be unmasked?

> diff --git a/drivers/pinctrl/mediatek/mtk-eint.c b/drivers/pinctrl/mediatek/mtk-eint.c
> index 737385e86beb807..7e526bcf5e0b55c 100644
> --- a/drivers/pinctrl/mediatek/mtk-eint.c
> +++ b/drivers/pinctrl/mediatek/mtk-eint.c
> @@ -113,6 +113,8 @@ static void mtk_eint_mask(struct irq_data *d)
>         void __iomem *reg = mtk_eint_get_offset(eint, d->hwirq,
>                                                 eint->regs->mask_set);
>  
> +       eint->cur_mask[d->hwirq >> 5] &= ~mask;
> +
>         writel(mask, reg);
>  }
>  
> @@ -123,6 +125,8 @@ static void mtk_eint_unmask(struct irq_data *d)
>         void __iomem *reg = mtk_eint_get_offset(eint, d->hwirq,
>                                                 eint->regs->mask_clr);
>  
> +       eint->cur_mask[d->hwirq >> 5] |= mask;
> +
>         writel(mask, reg);
>  
>         if (eint->dual_edge[d->hwirq])
> @@ -384,7 +375,6 @@ static void mtk_eint_irq_handler(struct irq_desc *desc)
>  
>  int mtk_eint_do_suspend(struct mtk_eint *eint)
>  {
> -       mtk_eint_chip_read_mask(eint, eint->base, eint->cur_mask);
>         mtk_eint_chip_write_mask(eint, eint->base, eint->wake_mask);
>  


This alone looks like, write out the mask to only allow wake interrupts.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] pinctrl: mediatek: Update cur_mask in mask/mask ops
  2019-05-13 22:29   ` Stephen Boyd
@ 2019-05-14  1:37     ` Nicolas Boichat
  2019-05-14 20:14       ` Stephen Boyd
  0 siblings, 1 reply; 18+ messages in thread
From: Nicolas Boichat @ 2019-05-14  1:37 UTC (permalink / raw)
  To: Stephen Boyd
  Cc: moderated list:ARM/Mediatek SoC support, Sean Wang,
	Linus Walleij, Matthias Brugger, linux-gpio,
	linux-arm Mailing List, lkml, Chuanjia Liu, Evan Green

On Tue, May 14, 2019 at 6:29 AM Stephen Boyd <swboyd@chromium.org> wrote:
>
> Quoting Nicolas Boichat (2019-04-28 20:55:15)
> > During suspend/resume, mtk_eint_mask may be called while
> > wake_mask is active. For example, this happens if a wake-source
> > with an active interrupt handler wakes the system:
> > irq/pm.c:irq_pm_check_wakeup would disable the interrupt, so
> > that it can be handled later on in the resume flow.
> >
> > However, this may happen before mtk_eint_do_resume is called:
> > in this case, wake_mask is loaded, and cur_mask is restored
> > from an older copy, re-enabling the interrupt, and causing
> > an interrupt storm (especially for level interrupts).
> >
> > Instead, we just record mask/unmask changes in cur_mask. This
> > also avoids the need to read the current mask in eint_do_suspend,
> > and we can remove mtk_eint_chip_read_mask function.
> >
> > Signed-off-by: Nicolas Boichat <drinkcat@chromium.org>
>
> It looks an awful lot like you should just use IRQCHIP_MASK_ON_SUSPEND
> here. Isn't that what's happening? All non-wake irqs should be masked at
> the hardware level so they can't cause a wakeup during suspend and on
> resume they can be unmasked?

No, this is for an line that has both wake and interrupt enabled. To
reword the commit message above:
 1. cur_mask[irq] = 1; wake_mask[irq] = 1; EINT_EN[irq] = 1 (interrupt
enabled at hardware level)
 2. System suspends, resumes due to that line (at this stage EINT_HW
== wake_mask)
 3. irq_pm_check_wakeup is called, and disables the interrupt =>
EINT_EN[irq] = 0, but we still have cur_mask[irq] = 1
 4. mtk_eint_do_resume is called, and restores EINT_EN = cur_mask, so
it reenables EINT_EN[irq] = 1 => interrupt storm.

This patch fixes the issue in step 3. So that the interrupt can be
re-enabled properly later on, sometimes after mtk_eint_do_resume, when
the driver is ready to handle it.

Also, IRQCHIP_MASK_ON_SUSPEND does not handle lines that are enabled
as a wake source, but without interrupt enabled (e.g. cros_ec driver
does that), which we do want to support.

> > diff --git a/drivers/pinctrl/mediatek/mtk-eint.c b/drivers/pinctrl/mediatek/mtk-eint.c
> > index 737385e86beb807..7e526bcf5e0b55c 100644
> > --- a/drivers/pinctrl/mediatek/mtk-eint.c
> > +++ b/drivers/pinctrl/mediatek/mtk-eint.c
> > @@ -113,6 +113,8 @@ static void mtk_eint_mask(struct irq_data *d)
> >         void __iomem *reg = mtk_eint_get_offset(eint, d->hwirq,
> >                                                 eint->regs->mask_set);
> >
> > +       eint->cur_mask[d->hwirq >> 5] &= ~mask;
> > +
> >         writel(mask, reg);
> >  }
> >
> > @@ -123,6 +125,8 @@ static void mtk_eint_unmask(struct irq_data *d)
> >         void __iomem *reg = mtk_eint_get_offset(eint, d->hwirq,
> >                                                 eint->regs->mask_clr);
> >
> > +       eint->cur_mask[d->hwirq >> 5] |= mask;
> > +
> >         writel(mask, reg);
> >
> >         if (eint->dual_edge[d->hwirq])
> > @@ -384,7 +375,6 @@ static void mtk_eint_irq_handler(struct irq_desc *desc)
> >
> >  int mtk_eint_do_suspend(struct mtk_eint *eint)
> >  {
> > -       mtk_eint_chip_read_mask(eint, eint->base, eint->cur_mask);
> >         mtk_eint_chip_write_mask(eint, eint->base, eint->wake_mask);
> >
>
>
> This alone looks like, write out the mask to only allow wake interrupts.

Yes, and enable wake interrupts that may not be in cur_mask.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] pinctrl: mediatek: Update cur_mask in mask/mask ops
  2019-05-14  1:37     ` Nicolas Boichat
@ 2019-05-14 20:14       ` Stephen Boyd
  2019-05-15  8:04         ` Nicolas Boichat
  0 siblings, 1 reply; 18+ messages in thread
From: Stephen Boyd @ 2019-05-14 20:14 UTC (permalink / raw)
  To: Nicolas Boichat
  Cc: moderated list:ARM/Mediatek SoC support, Sean Wang,
	Linus Walleij, Matthias Brugger, linux-gpio,
	linux-arm Mailing List, lkml, Chuanjia Liu, Evan Green,
	Thomas Gleixner

Quoting Nicolas Boichat (2019-05-13 18:37:58)
> On Tue, May 14, 2019 at 6:29 AM Stephen Boyd <swboyd@chromium.org> wrote:
> >
> > Quoting Nicolas Boichat (2019-04-28 20:55:15)
> > > During suspend/resume, mtk_eint_mask may be called while
> > > wake_mask is active. For example, this happens if a wake-source
> > > with an active interrupt handler wakes the system:
> > > irq/pm.c:irq_pm_check_wakeup would disable the interrupt, so
> > > that it can be handled later on in the resume flow.
> > >
> > > However, this may happen before mtk_eint_do_resume is called:
> > > in this case, wake_mask is loaded, and cur_mask is restored
> > > from an older copy, re-enabling the interrupt, and causing
> > > an interrupt storm (especially for level interrupts).
> > >
> > > Instead, we just record mask/unmask changes in cur_mask. This
> > > also avoids the need to read the current mask in eint_do_suspend,
> > > and we can remove mtk_eint_chip_read_mask function.
> > >
> > > Signed-off-by: Nicolas Boichat <drinkcat@chromium.org>
> >
> > It looks an awful lot like you should just use IRQCHIP_MASK_ON_SUSPEND
> > here. Isn't that what's happening? All non-wake irqs should be masked at
> > the hardware level so they can't cause a wakeup during suspend and on
> > resume they can be unmasked?
> 
> No, this is for an line that has both wake and interrupt enabled. To
> reword the commit message above:

Is my understanding correct that there isn't a different "wake up"
register that can be written to cause a GPIO to be configured to wake
the system from suspend? The only way to do so is to leave the GPIO
unmasked in the hardware by having EINT_EN[irq] = 1? And thus any
interrupts that we don't want to wake us up during suspend should be
masked in the hardware? 

If that's true, the code here that's trying to keep track of enabled
irqs and wakeup enabled irqs can be replaced with the irqchip flag so
that wakeup irqs are not masked while non-wakeups are masked.


>  1. cur_mask[irq] = 1; wake_mask[irq] = 1; EINT_EN[irq] = 1 (interrupt
> enabled at hardware level)
>  2. System suspends, resumes due to that line (at this stage EINT_HW
> == wake_mask)
>  3. irq_pm_check_wakeup is called, and disables the interrupt =>
> EINT_EN[irq] = 0, but we still have cur_mask[irq] = 1
>  4. mtk_eint_do_resume is called, and restores EINT_EN = cur_mask, so
> it reenables EINT_EN[irq] = 1 => interrupt storm.
> 
> This patch fixes the issue in step 3. So that the interrupt can be
> re-enabled properly later on, sometimes after mtk_eint_do_resume, when
> the driver is ready to handle it.

Right, we'd rather not see irqchip drivers working around the genirq
layer to do these things like tracking cur_mask and wake_mask. That
leads to subtle bugs and makes the driver maintain state across the
irqchip callbacks and system suspend/resume.

> 
> Also, IRQCHIP_MASK_ON_SUSPEND does not handle lines that are enabled
> as a wake source, but without interrupt enabled (e.g. cros_ec driver
> does that), which we do want to support.

Hmm. I thought that even if the irq is disabled by a driver, that would
be a lazy disable so it isn't really masked in the hardware. Then if an
interrupt comes in during suspend on a wake configured irq line, the
hardware will have left it unmasked because IRQCHIP_MASK_ON_SUSPEND in
combination with lazy disable would mean that the line is left unmasked
(ignoring whatever this mediatek driver is doing to mask and unmask in
PM hooks).

Just reading Documentation/power/suspend-and-interrupts.txt I'm led to
believe that the cros_ec driver shouldn't call disable_irq() on the
interrupt if it wants to wakeup from it:

"Calling enable_irq_wake() causes suspend_device_irqs() to treat the
given IRQ in a special way.  Namely, the IRQ remains enabled, by on the
first interrupt it will be disabled, marked as pending and "suspended"
so that it will be re-enabled by resume_device_irqs() during the
subsequent system resume.  Also the PM core is notified about the event
which causes the system suspend in progress to be aborted (that doesn't
have to happen immediately, but at one of the points where the suspend
thread looks for pending wakeup events)."

I suppose the problem is an irq line disabled in hardware that has
wakeup armed on it? Is this even valid? Shouldn't an irq be enabled for
wakeup to work?

We could immediately unmask those lines in the hardware when the
set_wake() callback is called. That way the genirq layer can use the
driver to do what it wants with the hardware and the driver can make
sure that set_wake() will always cause the wakeup interrupt to be
delivered to genirq even when software has disabled it.

But I think that there might be a problem with how genirq understands
the masked state of a line when the wakeup implementation conflates
masked state with wakeup armed state. Consider this call-flow:

	irq masked in hardware, IRQD_IRQ_MASKED is set
	enable_irq_wake()
	  unmask_irq in hardware
	IRQD_WAKEUP_ARMED is set
	<suspend and wakeup from irq>
	handle_level_irq()
	  mask_ack_irq()
	    mask_irq()
	      if (irqd_irq_masked()) -> returns true and skips masking!
	    if (desc->irq_data.chip->irq_ack)
	      ...
	  irq_may_run()
	    irq_pm_check_wakeup()
	      irq_disable()
	        mask_irq() -> does nothing again

In the above flow, we never mask the irq because we thought it was
already masked when it was disabled, but the irqchip implementation
unmasked it to make wakeup work. Maybe we should always mask the irq if
wakeup is armed and we're trying to call mask_irq()? Looks hacky.

diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
index 51128bea3846..20257d528880 100644
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -411,7 +411,7 @@ static inline void mask_ack_irq(struct irq_desc *desc)
 
 void mask_irq(struct irq_desc *desc)
 {
-	if (irqd_irq_masked(&desc->irq_data))
+	if (!irqd_is_wakeup_armed(&desc->irq_data) && irqd_irq_masked(&desc->irq_data))
 		return;
 
 	if (desc->irq_data.chip->irq_mask) {



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] pinctrl: mediatek: Update cur_mask in mask/mask ops
  2019-05-14 20:14       ` Stephen Boyd
@ 2019-05-15  8:04         ` Nicolas Boichat
  2019-05-30 17:12           ` Evan Green
  0 siblings, 1 reply; 18+ messages in thread
From: Nicolas Boichat @ 2019-05-15  8:04 UTC (permalink / raw)
  To: Stephen Boyd
  Cc: moderated list:ARM/Mediatek SoC support, Sean Wang,
	Linus Walleij, Matthias Brugger, linux-gpio,
	linux-arm Mailing List, lkml, Chuanjia Liu, Evan Green,
	Thomas Gleixner

On Wed, May 15, 2019 at 4:14 AM Stephen Boyd <swboyd@chromium.org> wrote:
>
> Quoting Nicolas Boichat (2019-05-13 18:37:58)
> > On Tue, May 14, 2019 at 6:29 AM Stephen Boyd <swboyd@chromium.org> wrote:
> > >
> > > Quoting Nicolas Boichat (2019-04-28 20:55:15)
> > > > During suspend/resume, mtk_eint_mask may be called while
> > > > wake_mask is active. For example, this happens if a wake-source
> > > > with an active interrupt handler wakes the system:
> > > > irq/pm.c:irq_pm_check_wakeup would disable the interrupt, so
> > > > that it can be handled later on in the resume flow.
> > > >
> > > > However, this may happen before mtk_eint_do_resume is called:
> > > > in this case, wake_mask is loaded, and cur_mask is restored
> > > > from an older copy, re-enabling the interrupt, and causing
> > > > an interrupt storm (especially for level interrupts).
> > > >
> > > > Instead, we just record mask/unmask changes in cur_mask. This
> > > > also avoids the need to read the current mask in eint_do_suspend,
> > > > and we can remove mtk_eint_chip_read_mask function.
> > > >
> > > > Signed-off-by: Nicolas Boichat <drinkcat@chromium.org>
> > >
> > > It looks an awful lot like you should just use IRQCHIP_MASK_ON_SUSPEND
> > > here. Isn't that what's happening? All non-wake irqs should be masked at
> > > the hardware level so they can't cause a wakeup during suspend and on
> > > resume they can be unmasked?
> >
> > No, this is for an line that has both wake and interrupt enabled. To
> > reword the commit message above:
>
> Is my understanding correct that there isn't a different "wake up"
> register that can be written to cause a GPIO to be configured to wake
> the system from suspend? The only way to do so is to leave the GPIO
> unmasked in the hardware by having EINT_EN[irq] = 1? And thus any
> interrupts that we don't want to wake us up during suspend should be
> masked in the hardware?

Yes, that's my understanding as well.

And then, what this driver does is to emulate the behaviour of a
controller that would actually have separate irq and wake enable
registers.

> If that's true, the code here that's trying to keep track of enabled
> irqs and wakeup enabled irqs can be replaced with the irqchip flag so
> that wakeup irqs are not masked while non-wakeups are masked.

Correct, but with the caveat that I don't see anything that definitely
requires an interrupt to be enabled to be a wake source. See below...

>
> >  1. cur_mask[irq] = 1; wake_mask[irq] = 1; EINT_EN[irq] = 1 (interrupt
> > enabled at hardware level)
> >  2. System suspends, resumes due to that line (at this stage EINT_HW
> > == wake_mask)
> >  3. irq_pm_check_wakeup is called, and disables the interrupt =>
> > EINT_EN[irq] = 0, but we still have cur_mask[irq] = 1
> >  4. mtk_eint_do_resume is called, and restores EINT_EN = cur_mask, so
> > it reenables EINT_EN[irq] = 1 => interrupt storm.
> >
> > This patch fixes the issue in step 3. So that the interrupt can be
> > re-enabled properly later on, sometimes after mtk_eint_do_resume, when
> > the driver is ready to handle it.
>
> Right, we'd rather not see irqchip drivers working around the genirq
> layer to do these things like tracking cur_mask and wake_mask. That
> leads to subtle bugs and makes the driver maintain state across the
> irqchip callbacks and system suspend/resume.
>
> >
> > Also, IRQCHIP_MASK_ON_SUSPEND does not handle lines that are enabled
> > as a wake source, but without interrupt enabled (e.g. cros_ec driver
> > does that), which we do want to support.
>
> Hmm. I thought that even if the irq is disabled by a driver, that would
> be a lazy disable so it isn't really masked in the hardware. Then if an
> interrupt comes in during suspend on a wake configured irq line, the
> hardware will have left it unmasked because IRQCHIP_MASK_ON_SUSPEND in
> combination with lazy disable would mean that the line is left unmasked
> (ignoring whatever this mediatek driver is doing to mask and unmask in
> PM hooks).

At the very least, that's not what happens with this system. The
interrupt is definitely not kept enabled in suspend, and the system
would not wake from an EC interrupt. (see also this series, BTW:
https://patchwork.kernel.org/cover/10921121/).

> Just reading Documentation/power/suspend-and-interrupts.txt I'm led to
> believe that the cros_ec driver shouldn't call disable_irq() on the
> interrupt if it wants to wakeup from it:
>
> "Calling enable_irq_wake() causes suspend_device_irqs() to treat the
> given IRQ in a special way.  Namely, the IRQ remains enabled, by on the
> first interrupt it will be disabled, marked as pending and "suspended"
> so that it will be re-enabled by resume_device_irqs() during the
> subsequent system resume.  Also the PM core is notified about the event
> which causes the system suspend in progress to be aborted (that doesn't
> have to happen immediately, but at one of the points where the suspend
> thread looks for pending wakeup events)."

I think this describes the behaviour when you keep both enabled.

> I suppose the problem is an irq line disabled in hardware that has
> wakeup armed on it? Is this even valid? Shouldn't an irq be enabled for
> wakeup to work?

I couldn't really find a definite answer, but there are a bunch of
examples of other drivers in the kernel:
 - drivers/extcon/extcon-usb-gpio.c:usb_extcon_suspend
 - drivers/hid/i2c-hid/i2c-hid.c:i2c_hid_suspend
 - drivers/mfd/max77843.c:max77843_suspend
(not exhaustive, this is quite hard to grep for...)

> We could immediately unmask those lines in the hardware when the
> set_wake() callback is called. That way the genirq layer can use the
> driver to do what it wants with the hardware and the driver can make
> sure that set_wake() will always cause the wakeup interrupt to be
> delivered to genirq even when software has disabled it.
>
> But I think that there might be a problem with how genirq understands
> the masked state of a line when the wakeup implementation conflates
> masked state with wakeup armed state. Consider this call-flow:
>
>         irq masked in hardware, IRQD_IRQ_MASKED is set
>         enable_irq_wake()
>           unmask_irq in hardware
>         IRQD_WAKEUP_ARMED is set
>         <suspend and wakeup from irq>
>         handle_level_irq()
>           mask_ack_irq()
>             mask_irq()
>               if (irqd_irq_masked()) -> returns true and skips masking!
>             if (desc->irq_data.chip->irq_ack)
>               ...
>           irq_may_run()
>             irq_pm_check_wakeup()
>               irq_disable()
>                 mask_irq() -> does nothing again
>
> In the above flow, we never mask the irq because we thought it was
> already masked when it was disabled, but the irqchip implementation
> unmasked it to make wakeup work. Maybe we should always mask the irq if
> wakeup is armed and we're trying to call mask_irq()? Looks hacky.
>
> diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
> index 51128bea3846..20257d528880 100644
> --- a/kernel/irq/chip.c
> +++ b/kernel/irq/chip.c
> @@ -411,7 +411,7 @@ static inline void mask_ack_irq(struct irq_desc *desc)
>
>  void mask_irq(struct irq_desc *desc)
>  {
> -       if (irqd_irq_masked(&desc->irq_data))
> +       if (!irqd_is_wakeup_armed(&desc->irq_data) && irqd_irq_masked(&desc->irq_data))
>                 return;
>
>         if (desc->irq_data.chip->irq_mask) {

I'm... really not sure what's the best approach here. But basically,
yes, if we can find a way to properly handle wake and interrupt
behaviour for drivers with a single mask, that'd be good.
IRQCHIP_MASK_ON_SUSPEND only seems to be doing half of the work, since
it does not cover the disable+wake source case.

Thanks,

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] pinctrl: mediatek: Update cur_mask in mask/mask ops
  2019-05-15  8:04         ` Nicolas Boichat
@ 2019-05-30 17:12           ` Evan Green
  2019-05-31  8:05             ` Chuanjia Liu
  2019-06-10 22:25             ` Stephen Boyd
  0 siblings, 2 replies; 18+ messages in thread
From: Evan Green @ 2019-05-30 17:12 UTC (permalink / raw)
  To: Nicolas Boichat
  Cc: Stephen Boyd, moderated list:ARM/Mediatek SoC support, Sean Wang,
	Linus Walleij, Matthias Brugger, linux-gpio,
	linux-arm Mailing List, lkml, Chuanjia Liu, Thomas Gleixner

On Wed, May 15, 2019 at 1:05 AM Nicolas Boichat <drinkcat@chromium.org> wrote:
>
> On Wed, May 15, 2019 at 4:14 AM Stephen Boyd <swboyd@chromium.org> wrote:
> >
> > Quoting Nicolas Boichat (2019-05-13 18:37:58)
> > > On Tue, May 14, 2019 at 6:29 AM Stephen Boyd <swboyd@chromium.org> wrote:
> > > >
> > > > Quoting Nicolas Boichat (2019-04-28 20:55:15)
> > > > > During suspend/resume, mtk_eint_mask may be called while
> > > > > wake_mask is active. For example, this happens if a wake-source
> > > > > with an active interrupt handler wakes the system:
> > > > > irq/pm.c:irq_pm_check_wakeup would disable the interrupt, so
> > > > > that it can be handled later on in the resume flow.
> > > > >
> > > > > However, this may happen before mtk_eint_do_resume is called:
> > > > > in this case, wake_mask is loaded, and cur_mask is restored
> > > > > from an older copy, re-enabling the interrupt, and causing
> > > > > an interrupt storm (especially for level interrupts).
> > > > >
> > > > > Instead, we just record mask/unmask changes in cur_mask. This
> > > > > also avoids the need to read the current mask in eint_do_suspend,
> > > > > and we can remove mtk_eint_chip_read_mask function.
> > > > >
> > > > > Signed-off-by: Nicolas Boichat <drinkcat@chromium.org>
> > > >
> > > > It looks an awful lot like you should just use IRQCHIP_MASK_ON_SUSPEND
> > > > here. Isn't that what's happening? All non-wake irqs should be masked at
> > > > the hardware level so they can't cause a wakeup during suspend and on
> > > > resume they can be unmasked?
> > >
> > > No, this is for an line that has both wake and interrupt enabled. To
> > > reword the commit message above:
> >
> > Is my understanding correct that there isn't a different "wake up"
> > register that can be written to cause a GPIO to be configured to wake
> > the system from suspend? The only way to do so is to leave the GPIO
> > unmasked in the hardware by having EINT_EN[irq] = 1? And thus any
> > interrupts that we don't want to wake us up during suspend should be
> > masked in the hardware?
>
> Yes, that's my understanding as well.
>
> And then, what this driver does is to emulate the behaviour of a
> controller that would actually have separate irq and wake enable
> registers.
>
> > If that's true, the code here that's trying to keep track of enabled
> > irqs and wakeup enabled irqs can be replaced with the irqchip flag so
> > that wakeup irqs are not masked while non-wakeups are masked.
>
> Correct, but with the caveat that I don't see anything that definitely
> requires an interrupt to be enabled to be a wake source. See below...
>
> >
> > >  1. cur_mask[irq] = 1; wake_mask[irq] = 1; EINT_EN[irq] = 1 (interrupt
> > > enabled at hardware level)
> > >  2. System suspends, resumes due to that line (at this stage EINT_HW
> > > == wake_mask)
> > >  3. irq_pm_check_wakeup is called, and disables the interrupt =>
> > > EINT_EN[irq] = 0, but we still have cur_mask[irq] = 1
> > >  4. mtk_eint_do_resume is called, and restores EINT_EN = cur_mask, so
> > > it reenables EINT_EN[irq] = 1 => interrupt storm.
> > >
> > > This patch fixes the issue in step 3. So that the interrupt can be
> > > re-enabled properly later on, sometimes after mtk_eint_do_resume, when
> > > the driver is ready to handle it.
> >
> > Right, we'd rather not see irqchip drivers working around the genirq
> > layer to do these things like tracking cur_mask and wake_mask. That
> > leads to subtle bugs and makes the driver maintain state across the
> > irqchip callbacks and system suspend/resume.
> >
> > >
> > > Also, IRQCHIP_MASK_ON_SUSPEND does not handle lines that are enabled
> > > as a wake source, but without interrupt enabled (e.g. cros_ec driver
> > > does that), which we do want to support.
> >
> > Hmm. I thought that even if the irq is disabled by a driver, that would
> > be a lazy disable so it isn't really masked in the hardware. Then if an
> > interrupt comes in during suspend on a wake configured irq line, the
> > hardware will have left it unmasked because IRQCHIP_MASK_ON_SUSPEND in
> > combination with lazy disable would mean that the line is left unmasked
> > (ignoring whatever this mediatek driver is doing to mask and unmask in
> > PM hooks).
>
> At the very least, that's not what happens with this system. The
> interrupt is definitely not kept enabled in suspend, and the system
> would not wake from an EC interrupt. (see also this series, BTW:
> https://patchwork.kernel.org/cover/10921121/).
>
> > Just reading Documentation/power/suspend-and-interrupts.txt I'm led to
> > believe that the cros_ec driver shouldn't call disable_irq() on the
> > interrupt if it wants to wakeup from it:
> >
> > "Calling enable_irq_wake() causes suspend_device_irqs() to treat the
> > given IRQ in a special way.  Namely, the IRQ remains enabled, by on the
> > first interrupt it will be disabled, marked as pending and "suspended"
> > so that it will be re-enabled by resume_device_irqs() during the
> > subsequent system resume.  Also the PM core is notified about the event
> > which causes the system suspend in progress to be aborted (that doesn't
> > have to happen immediately, but at one of the points where the suspend
> > thread looks for pending wakeup events)."
>
> I think this describes the behaviour when you keep both enabled.
>
> > I suppose the problem is an irq line disabled in hardware that has
> > wakeup armed on it? Is this even valid? Shouldn't an irq be enabled for
> > wakeup to work?
>
> I couldn't really find a definite answer, but there are a bunch of
> examples of other drivers in the kernel:
>  - drivers/extcon/extcon-usb-gpio.c:usb_extcon_suspend
>  - drivers/hid/i2c-hid/i2c-hid.c:i2c_hid_suspend
>  - drivers/mfd/max77843.c:max77843_suspend
> (not exhaustive, this is quite hard to grep for...)
>
> > We could immediately unmask those lines in the hardware when the
> > set_wake() callback is called. That way the genirq layer can use the
> > driver to do what it wants with the hardware and the driver can make
> > sure that set_wake() will always cause the wakeup interrupt to be
> > delivered to genirq even when software has disabled it.
> >
> > But I think that there might be a problem with how genirq understands
> > the masked state of a line when the wakeup implementation conflates
> > masked state with wakeup armed state. Consider this call-flow:
> >
> >         irq masked in hardware, IRQD_IRQ_MASKED is set
> >         enable_irq_wake()
> >           unmask_irq in hardware
> >         IRQD_WAKEUP_ARMED is set
> >         <suspend and wakeup from irq>
> >         handle_level_irq()
> >           mask_ack_irq()
> >             mask_irq()
> >               if (irqd_irq_masked()) -> returns true and skips masking!
> >             if (desc->irq_data.chip->irq_ack)
> >               ...
> >           irq_may_run()
> >             irq_pm_check_wakeup()
> >               irq_disable()
> >                 mask_irq() -> does nothing again
> >
> > In the above flow, we never mask the irq because we thought it was
> > already masked when it was disabled, but the irqchip implementation
> > unmasked it to make wakeup work. Maybe we should always mask the irq if
> > wakeup is armed and we're trying to call mask_irq()? Looks hacky.
> >
> > diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
> > index 51128bea3846..20257d528880 100644
> > --- a/kernel/irq/chip.c
> > +++ b/kernel/irq/chip.c
> > @@ -411,7 +411,7 @@ static inline void mask_ack_irq(struct irq_desc *desc)
> >
> >  void mask_irq(struct irq_desc *desc)
> >  {
> > -       if (irqd_irq_masked(&desc->irq_data))
> > +       if (!irqd_is_wakeup_armed(&desc->irq_data) && irqd_irq_masked(&desc->irq_data))
> >                 return;
> >
> >         if (desc->irq_data.chip->irq_mask) {
>
> I'm... really not sure what's the best approach here. But basically,
> yes, if we can find a way to properly handle wake and interrupt
> behaviour for drivers with a single mask, that'd be good.
> IRQCHIP_MASK_ON_SUSPEND only seems to be doing half of the work, since
> it does not cover the disable+wake source case.
>
> Thanks,

I finally got around to studying this patch. This series seems okay to
me. The underlying problem is really that the hardware IRQ enabled
state is out of sync with what Linux thinks. This happens during
suspend because Linux thinks the irq is disabled, but due to the
hardware constraints on this platform, the interrupt has to be enabled
for it to be a wake source. So the mtk driver re-enables the
interrupt, and then has to find a way to get back in sync with Linux's
IRQ mask state.

One possible approach is mentioned above by Stephen: stop calling
disable_irq in the cros EC driver. Then both linux and mtk agree the
interrupt is enabled at suspend time. I think this ran into other
problems though, where the EC gets its interrupt but is unable to
silence it because the underlying SPI bus is still suspended.

The other approach, taken here, is to mask the interrupt when it first
comes in, getting Linux and mtk back in agreement that yes, the
interrupt is masked. Outside of enlightening the generic IRQ core
about these types of interrupts that need to get re-enabled to be wake
sources, this seems like a reasonable approach.
-Evan

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] pinctrl: mediatek: Update cur_mask in mask/mask ops
  2019-05-30 17:12           ` Evan Green
@ 2019-05-31  8:05             ` Chuanjia Liu
  2019-05-31 17:17               ` Evan Green
  2019-06-10 22:25             ` Stephen Boyd
  1 sibling, 1 reply; 18+ messages in thread
From: Chuanjia Liu @ 2019-05-31  8:05 UTC (permalink / raw)
  To: Evan Green
  Cc: Nicolas Boichat, Stephen Boyd,
	moderated list:ARM/Mediatek SoC support, Sean Wang,
	Linus Walleij, Matthias Brugger, linux-gpio,
	linux-arm Mailing List, lkml, Thomas Gleixner

On Thu, 2019-05-30 at 10:12 -0700, Evan Green wrote:
> On Wed, May 15, 2019 at 1:05 AM Nicolas Boichat <drinkcat@chromium.org> wrote:
> >
> > On Wed, May 15, 2019 at 4:14 AM Stephen Boyd <swboyd@chromium.org> wrote:
> > >
> > > Quoting Nicolas Boichat (2019-05-13 18:37:58)
> > > > On Tue, May 14, 2019 at 6:29 AM Stephen Boyd <swboyd@chromium.org> wrote:
> > > > >
> > > > > Quoting Nicolas Boichat (2019-04-28 20:55:15)
> > > > > > During suspend/resume, mtk_eint_mask may be called while
> > > > > > wake_mask is active. For example, this happens if a wake-source
> > > > > > with an active interrupt handler wakes the system:
> > > > > > irq/pm.c:irq_pm_check_wakeup would disable the interrupt, so
> > > > > > that it can be handled later on in the resume flow.
> > > > > >
> > > > > > However, this may happen before mtk_eint_do_resume is called:
> > > > > > in this case, wake_mask is loaded, and cur_mask is restored
> > > > > > from an older copy, re-enabling the interrupt, and causing
> > > > > > an interrupt storm (especially for level interrupts).
> > > > > >
> > > > > > Instead, we just record mask/unmask changes in cur_mask. This
> > > > > > also avoids the need to read the current mask in eint_do_suspend,
> > > > > > and we can remove mtk_eint_chip_read_mask function.
> > > > > >
> > > > > > Signed-off-by: Nicolas Boichat <drinkcat@chromium.org>
> > > > >
> > > > > It looks an awful lot like you should just use IRQCHIP_MASK_ON_SUSPEND
> > > > > here. Isn't that what's happening? All non-wake irqs should be masked at
> > > > > the hardware level so they can't cause a wakeup during suspend and on
> > > > > resume they can be unmasked?
> > > >
> > > > No, this is for an line that has both wake and interrupt enabled. To
> > > > reword the commit message above:
> > >
> > > Is my understanding correct that there isn't a different "wake up"
> > > register that can be written to cause a GPIO to be configured to wake
> > > the system from suspend? The only way to do so is to leave the GPIO
> > > unmasked in the hardware by having EINT_EN[irq] = 1? And thus any
> > > interrupts that we don't want to wake us up during suspend should be
> > > masked in the hardware?
> >
> > Yes, that's my understanding as well.
> >
> > And then, what this driver does is to emulate the behaviour of a
> > controller that would actually have separate irq and wake enable
> > registers.
> >
> > > If that's true, the code here that's trying to keep track of enabled
> > > irqs and wakeup enabled irqs can be replaced with the irqchip flag so
> > > that wakeup irqs are not masked while non-wakeups are masked.
> >
> > Correct, but with the caveat that I don't see anything that definitely
> > requires an interrupt to be enabled to be a wake source. See below...
> >
> > >
> > > >  1. cur_mask[irq] = 1; wake_mask[irq] = 1; EINT_EN[irq] = 1 (interrupt
> > > > enabled at hardware level)
> > > >  2. System suspends, resumes due to that line (at this stage EINT_HW
> > > > == wake_mask)
> > > >  3. irq_pm_check_wakeup is called, and disables the interrupt =>
> > > > EINT_EN[irq] = 0, but we still have cur_mask[irq] = 1
> > > >  4. mtk_eint_do_resume is called, and restores EINT_EN = cur_mask, so
> > > > it reenables EINT_EN[irq] = 1 => interrupt storm.
> > > >
> > > > This patch fixes the issue in step 3. So that the interrupt can be
> > > > re-enabled properly later on, sometimes after mtk_eint_do_resume, when
> > > > the driver is ready to handle it.
> > >
> > > Right, we'd rather not see irqchip drivers working around the genirq
> > > layer to do these things like tracking cur_mask and wake_mask. That
> > > leads to subtle bugs and makes the driver maintain state across the
> > > irqchip callbacks and system suspend/resume.
> > >
> > > >
> > > > Also, IRQCHIP_MASK_ON_SUSPEND does not handle lines that are enabled
> > > > as a wake source, but without interrupt enabled (e.g. cros_ec driver
> > > > does that), which we do want to support.
> > >
> > > Hmm. I thought that even if the irq is disabled by a driver, that would
> > > be a lazy disable so it isn't really masked in the hardware. Then if an
> > > interrupt comes in during suspend on a wake configured irq line, the
> > > hardware will have left it unmasked because IRQCHIP_MASK_ON_SUSPEND in
> > > combination with lazy disable would mean that the line is left unmasked
> > > (ignoring whatever this mediatek driver is doing to mask and unmask in
> > > PM hooks).
> >
> > At the very least, that's not what happens with this system. The
> > interrupt is definitely not kept enabled in suspend, and the system
> > would not wake from an EC interrupt. (see also this series, BTW:
> > https://patchwork.kernel.org/cover/10921121/).
> >
> > > Just reading Documentation/power/suspend-and-interrupts.txt I'm led to
> > > believe that the cros_ec driver shouldn't call disable_irq() on the
> > > interrupt if it wants to wakeup from it:
> > >
> > > "Calling enable_irq_wake() causes suspend_device_irqs() to treat the
> > > given IRQ in a special way.  Namely, the IRQ remains enabled, by on the
> > > first interrupt it will be disabled, marked as pending and "suspended"
> > > so that it will be re-enabled by resume_device_irqs() during the
> > > subsequent system resume.  Also the PM core is notified about the event
> > > which causes the system suspend in progress to be aborted (that doesn't
> > > have to happen immediately, but at one of the points where the suspend
> > > thread looks for pending wakeup events)."
> >
> > I think this describes the behaviour when you keep both enabled.
> >
> > > I suppose the problem is an irq line disabled in hardware that has
> > > wakeup armed on it? Is this even valid? Shouldn't an irq be enabled for
> > > wakeup to work?
> >
> > I couldn't really find a definite answer, but there are a bunch of
> > examples of other drivers in the kernel:
> >  - drivers/extcon/extcon-usb-gpio.c:usb_extcon_suspend
> >  - drivers/hid/i2c-hid/i2c-hid.c:i2c_hid_suspend
> >  - drivers/mfd/max77843.c:max77843_suspend
> > (not exhaustive, this is quite hard to grep for...)
> >
> > > We could immediately unmask those lines in the hardware when the
> > > set_wake() callback is called. That way the genirq layer can use the
> > > driver to do what it wants with the hardware and the driver can make
> > > sure that set_wake() will always cause the wakeup interrupt to be
> > > delivered to genirq even when software has disabled it.
> > >
> > > But I think that there might be a problem with how genirq understands
> > > the masked state of a line when the wakeup implementation conflates
> > > masked state with wakeup armed state. Consider this call-flow:
> > >
> > >         irq masked in hardware, IRQD_IRQ_MASKED is set
> > >         enable_irq_wake()
> > >           unmask_irq in hardware
> > >         IRQD_WAKEUP_ARMED is set
> > >         <suspend and wakeup from irq>
> > >         handle_level_irq()
> > >           mask_ack_irq()
> > >             mask_irq()
> > >               if (irqd_irq_masked()) -> returns true and skips masking!
> > >             if (desc->irq_data.chip->irq_ack)
> > >               ...
> > >           irq_may_run()
> > >             irq_pm_check_wakeup()
> > >               irq_disable()
> > >                 mask_irq() -> does nothing again
> > >
> > > In the above flow, we never mask the irq because we thought it was
> > > already masked when it was disabled, but the irqchip implementation
> > > unmasked it to make wakeup work. Maybe we should always mask the irq if
> > > wakeup is armed and we're trying to call mask_irq()? Looks hacky.
Maybe we can implement irqchip's mask_ack_irq  in mediatek driver to
always mask the irq. Then flow will always call it without judgment
IRQD_IRQ_MASKED.

diff --git a/drivers/pinctrl/mediatek/mtk-eint.c
b/drivers/pinctrl/mediatek/mtk-
index f464f8c..9f1aae2 100644
--- a/drivers/pinctrl/mediatek/mtk-eint.c
+++ b/drivers/pinctrl/mediatek/mtk-eint.c
@@ -272,12 +272,19 @@ static void mtk_eint_irq_release_resources(struct
irq_data
        gpiochip_unlock_as_irq(gpio_c, gpio_n);
 }

+static void mtk_eint_mask_ack(struct irq_data *d)
+{
+       mtk_eint_mask(d);
+       mtk_eint_ack(d);
+}
+
 static struct irq_chip mtk_eint_irq_chip = {
        .name = "mt-eint",
        .irq_disable = mtk_eint_mask,
        .irq_mask = mtk_eint_mask,
        .irq_unmask = mtk_eint_unmask,
        .irq_ack = mtk_eint_ack,
+       .irq_mask_ack = mtk_eint_mask_ack,
        .irq_set_type = mtk_eint_set_type,
        .irq_set_wake = mtk_eint_irq_set_wake,
        .irq_request_resources = mtk_eint_irq_request_resources,

This seems like a small change.
thanks.
> > >
> > > diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
> > > index 51128bea3846..20257d528880 100644
> > > --- a/kernel/irq/chip.c
> > > +++ b/kernel/irq/chip.c
> > > @@ -411,7 +411,7 @@ static inline void mask_ack_irq(struct irq_desc *desc)
> > >
> > >  void mask_irq(struct irq_desc *desc)
> > >  {
> > > -       if (irqd_irq_masked(&desc->irq_data))
> > > +       if (!irqd_is_wakeup_armed(&desc->irq_data) && irqd_irq_masked(&desc->irq_data))
> > >                 return;
> > >
> > >         if (desc->irq_data.chip->irq_mask) {
> >
> > I'm... really not sure what's the best approach here. But basically,
> > yes, if we can find a way to properly handle wake and interrupt
> > behaviour for drivers with a single mask, that'd be good.
> > IRQCHIP_MASK_ON_SUSPEND only seems to be doing half of the work, since
> > it does not cover the disable+wake source case.
> >
> > Thanks,
> 
> I finally got around to studying this patch. This series seems okay to
> me. The underlying problem is really that the hardware IRQ enabled
> state is out of sync with what Linux thinks. This happens during
> suspend because Linux thinks the irq is disabled, but due to the
> hardware constraints on this platform, the interrupt has to be enabled
> for it to be a wake source. So the mtk driver re-enables the
> interrupt, and then has to find a way to get back in sync with Linux's
> IRQ mask state.
> 
> One possible approach is mentioned above by Stephen: stop calling
> disable_irq in the cros EC driver. Then both linux and mtk agree the
> interrupt is enabled at suspend time. I think this ran into other
> problems though, where the EC gets its interrupt but is unable to
> silence it because the underlying SPI bus is still suspended.
> 
> The other approach, taken here, is to mask the interrupt when it first
> comes in, getting Linux and mtk back in agreement that yes, the
> interrupt is masked. Outside of enlightening the generic IRQ core
> about these types of interrupts that need to get re-enabled to be wake
> sources, this seems like a reasonable approach.
> -Evan



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] pinctrl: mediatek: Update cur_mask in mask/mask ops
  2019-05-31  8:05             ` Chuanjia Liu
@ 2019-05-31 17:17               ` Evan Green
  2019-06-03  8:01                 ` Chuanjia Liu
  0 siblings, 1 reply; 18+ messages in thread
From: Evan Green @ 2019-05-31 17:17 UTC (permalink / raw)
  To: Chuanjia Liu
  Cc: Nicolas Boichat, Stephen Boyd,
	moderated list:ARM/Mediatek SoC support, Sean Wang,
	Linus Walleij, Matthias Brugger, linux-gpio,
	linux-arm Mailing List, lkml, Thomas Gleixner

On Fri, May 31, 2019 at 1:06 AM Chuanjia Liu <Chuanjia.Liu@mediatek.com> wrote:
>
> On Thu, 2019-05-30 at 10:12 -0700, Evan Green wrote:
> > On Wed, May 15, 2019 at 1:05 AM Nicolas Boichat <drinkcat@chromium.org> wrote:
> > >
> > > On Wed, May 15, 2019 at 4:14 AM Stephen Boyd <swboyd@chromium.org> wrote:
> > > >
> > > > Quoting Nicolas Boichat (2019-05-13 18:37:58)
> > > > > On Tue, May 14, 2019 at 6:29 AM Stephen Boyd <swboyd@chromium.org> wrote:
> > > > > >
> > > > > > Quoting Nicolas Boichat (2019-04-28 20:55:15)
> > > > > > > During suspend/resume, mtk_eint_mask may be called while
> > > > > > > wake_mask is active. For example, this happens if a wake-source
> > > > > > > with an active interrupt handler wakes the system:
> > > > > > > irq/pm.c:irq_pm_check_wakeup would disable the interrupt, so
> > > > > > > that it can be handled later on in the resume flow.
> > > > > > >
> > > > > > > However, this may happen before mtk_eint_do_resume is called:
> > > > > > > in this case, wake_mask is loaded, and cur_mask is restored
> > > > > > > from an older copy, re-enabling the interrupt, and causing
> > > > > > > an interrupt storm (especially for level interrupts).
> > > > > > >
> > > > > > > Instead, we just record mask/unmask changes in cur_mask. This
> > > > > > > also avoids the need to read the current mask in eint_do_suspend,
> > > > > > > and we can remove mtk_eint_chip_read_mask function.
> > > > > > >
> > > > > > > Signed-off-by: Nicolas Boichat <drinkcat@chromium.org>
> > > > > >
> > > > > > It looks an awful lot like you should just use IRQCHIP_MASK_ON_SUSPEND
> > > > > > here. Isn't that what's happening? All non-wake irqs should be masked at
> > > > > > the hardware level so they can't cause a wakeup during suspend and on
> > > > > > resume they can be unmasked?
> > > > >
> > > > > No, this is for an line that has both wake and interrupt enabled. To
> > > > > reword the commit message above:
> > > >
> > > > Is my understanding correct that there isn't a different "wake up"
> > > > register that can be written to cause a GPIO to be configured to wake
> > > > the system from suspend? The only way to do so is to leave the GPIO
> > > > unmasked in the hardware by having EINT_EN[irq] = 1? And thus any
> > > > interrupts that we don't want to wake us up during suspend should be
> > > > masked in the hardware?
> > >
> > > Yes, that's my understanding as well.
> > >
> > > And then, what this driver does is to emulate the behaviour of a
> > > controller that would actually have separate irq and wake enable
> > > registers.
> > >
> > > > If that's true, the code here that's trying to keep track of enabled
> > > > irqs and wakeup enabled irqs can be replaced with the irqchip flag so
> > > > that wakeup irqs are not masked while non-wakeups are masked.
> > >
> > > Correct, but with the caveat that I don't see anything that definitely
> > > requires an interrupt to be enabled to be a wake source. See below...
> > >
> > > >
> > > > >  1. cur_mask[irq] = 1; wake_mask[irq] = 1; EINT_EN[irq] = 1 (interrupt
> > > > > enabled at hardware level)
> > > > >  2. System suspends, resumes due to that line (at this stage EINT_HW
> > > > > == wake_mask)
> > > > >  3. irq_pm_check_wakeup is called, and disables the interrupt =>
> > > > > EINT_EN[irq] = 0, but we still have cur_mask[irq] = 1
> > > > >  4. mtk_eint_do_resume is called, and restores EINT_EN = cur_mask, so
> > > > > it reenables EINT_EN[irq] = 1 => interrupt storm.
> > > > >
> > > > > This patch fixes the issue in step 3. So that the interrupt can be
> > > > > re-enabled properly later on, sometimes after mtk_eint_do_resume, when
> > > > > the driver is ready to handle it.
> > > >
> > > > Right, we'd rather not see irqchip drivers working around the genirq
> > > > layer to do these things like tracking cur_mask and wake_mask. That
> > > > leads to subtle bugs and makes the driver maintain state across the
> > > > irqchip callbacks and system suspend/resume.
> > > >
> > > > >
> > > > > Also, IRQCHIP_MASK_ON_SUSPEND does not handle lines that are enabled
> > > > > as a wake source, but without interrupt enabled (e.g. cros_ec driver
> > > > > does that), which we do want to support.
> > > >
> > > > Hmm. I thought that even if the irq is disabled by a driver, that would
> > > > be a lazy disable so it isn't really masked in the hardware. Then if an
> > > > interrupt comes in during suspend on a wake configured irq line, the
> > > > hardware will have left it unmasked because IRQCHIP_MASK_ON_SUSPEND in
> > > > combination with lazy disable would mean that the line is left unmasked
> > > > (ignoring whatever this mediatek driver is doing to mask and unmask in
> > > > PM hooks).
> > >
> > > At the very least, that's not what happens with this system. The
> > > interrupt is definitely not kept enabled in suspend, and the system
> > > would not wake from an EC interrupt. (see also this series, BTW:
> > > https://patchwork.kernel.org/cover/10921121/).
> > >
> > > > Just reading Documentation/power/suspend-and-interrupts.txt I'm led to
> > > > believe that the cros_ec driver shouldn't call disable_irq() on the
> > > > interrupt if it wants to wakeup from it:
> > > >
> > > > "Calling enable_irq_wake() causes suspend_device_irqs() to treat the
> > > > given IRQ in a special way.  Namely, the IRQ remains enabled, by on the
> > > > first interrupt it will be disabled, marked as pending and "suspended"
> > > > so that it will be re-enabled by resume_device_irqs() during the
> > > > subsequent system resume.  Also the PM core is notified about the event
> > > > which causes the system suspend in progress to be aborted (that doesn't
> > > > have to happen immediately, but at one of the points where the suspend
> > > > thread looks for pending wakeup events)."
> > >
> > > I think this describes the behaviour when you keep both enabled.
> > >
> > > > I suppose the problem is an irq line disabled in hardware that has
> > > > wakeup armed on it? Is this even valid? Shouldn't an irq be enabled for
> > > > wakeup to work?
> > >
> > > I couldn't really find a definite answer, but there are a bunch of
> > > examples of other drivers in the kernel:
> > >  - drivers/extcon/extcon-usb-gpio.c:usb_extcon_suspend
> > >  - drivers/hid/i2c-hid/i2c-hid.c:i2c_hid_suspend
> > >  - drivers/mfd/max77843.c:max77843_suspend
> > > (not exhaustive, this is quite hard to grep for...)
> > >
> > > > We could immediately unmask those lines in the hardware when the
> > > > set_wake() callback is called. That way the genirq layer can use the
> > > > driver to do what it wants with the hardware and the driver can make
> > > > sure that set_wake() will always cause the wakeup interrupt to be
> > > > delivered to genirq even when software has disabled it.
> > > >
> > > > But I think that there might be a problem with how genirq understands
> > > > the masked state of a line when the wakeup implementation conflates
> > > > masked state with wakeup armed state. Consider this call-flow:
> > > >
> > > >         irq masked in hardware, IRQD_IRQ_MASKED is set
> > > >         enable_irq_wake()
> > > >           unmask_irq in hardware
> > > >         IRQD_WAKEUP_ARMED is set
> > > >         <suspend and wakeup from irq>
> > > >         handle_level_irq()
> > > >           mask_ack_irq()
> > > >             mask_irq()
> > > >               if (irqd_irq_masked()) -> returns true and skips masking!
> > > >             if (desc->irq_data.chip->irq_ack)
> > > >               ...
> > > >           irq_may_run()
> > > >             irq_pm_check_wakeup()
> > > >               irq_disable()
> > > >                 mask_irq() -> does nothing again
> > > >
> > > > In the above flow, we never mask the irq because we thought it was
> > > > already masked when it was disabled, but the irqchip implementation
> > > > unmasked it to make wakeup work. Maybe we should always mask the irq if
> > > > wakeup is armed and we're trying to call mask_irq()? Looks hacky.
> Maybe we can implement irqchip's mask_ack_irq  in mediatek driver to
> always mask the irq. Then flow will always call it without judgment
> IRQD_IRQ_MASKED.
>
> diff --git a/drivers/pinctrl/mediatek/mtk-eint.c
> b/drivers/pinctrl/mediatek/mtk-
> index f464f8c..9f1aae2 100644
> --- a/drivers/pinctrl/mediatek/mtk-eint.c
> +++ b/drivers/pinctrl/mediatek/mtk-eint.c
> @@ -272,12 +272,19 @@ static void mtk_eint_irq_release_resources(struct
> irq_data
>         gpiochip_unlock_as_irq(gpio_c, gpio_n);
>  }
>
> +static void mtk_eint_mask_ack(struct irq_data *d)
> +{
> +       mtk_eint_mask(d);
> +       mtk_eint_ack(d);
> +}
> +
>  static struct irq_chip mtk_eint_irq_chip = {
>         .name = "mt-eint",
>         .irq_disable = mtk_eint_mask,
>         .irq_mask = mtk_eint_mask,
>         .irq_unmask = mtk_eint_unmask,
>         .irq_ack = mtk_eint_ack,
> +       .irq_mask_ack = mtk_eint_mask_ack,
>         .irq_set_type = mtk_eint_set_type,
>         .irq_set_wake = mtk_eint_irq_set_wake,
>         .irq_request_resources = mtk_eint_irq_request_resources,
>
> This seems like a small change.
> thanks.

Does this work? My understanding is that Linux thinks the irq is
_already_ masked, so it short-circuits in the generic IRQ code and
doesn't call mask again.
-Evan

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] pinctrl: mediatek: Update cur_mask in mask/mask ops
  2019-05-31 17:17               ` Evan Green
@ 2019-06-03  8:01                 ` Chuanjia Liu
  2019-06-04  0:57                   ` Nicolas Boichat
  0 siblings, 1 reply; 18+ messages in thread
From: Chuanjia Liu @ 2019-06-03  8:01 UTC (permalink / raw)
  To: Evan Green
  Cc: Nicolas Boichat, Stephen Boyd,
	moderated list:ARM/Mediatek SoC support, Sean Wang,
	Linus Walleij, Matthias Brugger, linux-gpio,
	linux-arm Mailing List, lkml, Thomas Gleixner

On Fri, 2019-05-31 at 10:17 -0700, Evan Green wrote:
> On Fri, May 31, 2019 at 1:06 AM Chuanjia Liu <Chuanjia.Liu@mediatek.com> wrote:
> >
> > On Thu, 2019-05-30 at 10:12 -0700, Evan Green wrote:
> > > On Wed, May 15, 2019 at 1:05 AM Nicolas Boichat <drinkcat@chromium.org> wrote:
> > > >
> > > > On Wed, May 15, 2019 at 4:14 AM Stephen Boyd <swboyd@chromium.org> wrote:
> > > > >
> > > > > Quoting Nicolas Boichat (2019-05-13 18:37:58)
> > > > > > On Tue, May 14, 2019 at 6:29 AM Stephen Boyd <swboyd@chromium.org> wrote:
> > > > > > >
> > > > > > > Quoting Nicolas Boichat (2019-04-28 20:55:15)
> > > > > > > > During suspend/resume, mtk_eint_mask may be called while
> > > > > > > > wake_mask is active. For example, this happens if a wake-source
> > > > > > > > with an active interrupt handler wakes the system:
> > > > > > > > irq/pm.c:irq_pm_check_wakeup would disable the interrupt, so
> > > > > > > > that it can be handled later on in the resume flow.
> > > > > > > >
> > > > > > > > However, this may happen before mtk_eint_do_resume is called:
> > > > > > > > in this case, wake_mask is loaded, and cur_mask is restored
> > > > > > > > from an older copy, re-enabling the interrupt, and causing
> > > > > > > > an interrupt storm (especially for level interrupts).
> > > > > > > >
> > > > > > > > Instead, we just record mask/unmask changes in cur_mask. This
> > > > > > > > also avoids the need to read the current mask in eint_do_suspend,
> > > > > > > > and we can remove mtk_eint_chip_read_mask function.
> > > > > > > >
> > > > > > > > Signed-off-by: Nicolas Boichat <drinkcat@chromium.org>
> > > > > > >
> > > > > > > It looks an awful lot like you should just use IRQCHIP_MASK_ON_SUSPEND
> > > > > > > here. Isn't that what's happening? All non-wake irqs should be masked at
> > > > > > > the hardware level so they can't cause a wakeup during suspend and on
> > > > > > > resume they can be unmasked?
> > > > > >
> > > > > > No, this is for an line that has both wake and interrupt enabled. To
> > > > > > reword the commit message above:
> > > > >
> > > > > Is my understanding correct that there isn't a different "wake up"
> > > > > register that can be written to cause a GPIO to be configured to wake
> > > > > the system from suspend? The only way to do so is to leave the GPIO
> > > > > unmasked in the hardware by having EINT_EN[irq] = 1? And thus any
> > > > > interrupts that we don't want to wake us up during suspend should be
> > > > > masked in the hardware?
> > > >
> > > > Yes, that's my understanding as well.
> > > >
> > > > And then, what this driver does is to emulate the behaviour of a
> > > > controller that would actually have separate irq and wake enable
> > > > registers.
> > > >
> > > > > If that's true, the code here that's trying to keep track of enabled
> > > > > irqs and wakeup enabled irqs can be replaced with the irqchip flag so
> > > > > that wakeup irqs are not masked while non-wakeups are masked.
> > > >
> > > > Correct, but with the caveat that I don't see anything that definitely
> > > > requires an interrupt to be enabled to be a wake source. See below...
> > > >
> > > > >
> > > > > >  1. cur_mask[irq] = 1; wake_mask[irq] = 1; EINT_EN[irq] = 1 (interrupt
> > > > > > enabled at hardware level)
> > > > > >  2. System suspends, resumes due to that line (at this stage EINT_HW
> > > > > > == wake_mask)
> > > > > >  3. irq_pm_check_wakeup is called, and disables the interrupt =>
> > > > > > EINT_EN[irq] = 0, but we still have cur_mask[irq] = 1
> > > > > >  4. mtk_eint_do_resume is called, and restores EINT_EN = cur_mask, so
> > > > > > it reenables EINT_EN[irq] = 1 => interrupt storm.
> > > > > >
> > > > > > This patch fixes the issue in step 3. So that the interrupt can be
> > > > > > re-enabled properly later on, sometimes after mtk_eint_do_resume, when
> > > > > > the driver is ready to handle it.
> > > > >
> > > > > Right, we'd rather not see irqchip drivers working around the genirq
> > > > > layer to do these things like tracking cur_mask and wake_mask. That
> > > > > leads to subtle bugs and makes the driver maintain state across the
> > > > > irqchip callbacks and system suspend/resume.
> > > > >
> > > > > >
> > > > > > Also, IRQCHIP_MASK_ON_SUSPEND does not handle lines that are enabled
> > > > > > as a wake source, but without interrupt enabled (e.g. cros_ec driver
> > > > > > does that), which we do want to support.
> > > > >
> > > > > Hmm. I thought that even if the irq is disabled by a driver, that would
> > > > > be a lazy disable so it isn't really masked in the hardware. Then if an
> > > > > interrupt comes in during suspend on a wake configured irq line, the
> > > > > hardware will have left it unmasked because IRQCHIP_MASK_ON_SUSPEND in
> > > > > combination with lazy disable would mean that the line is left unmasked
> > > > > (ignoring whatever this mediatek driver is doing to mask and unmask in
> > > > > PM hooks).
> > > >
> > > > At the very least, that's not what happens with this system. The
> > > > interrupt is definitely not kept enabled in suspend, and the system
> > > > would not wake from an EC interrupt. (see also this series, BTW:
> > > > https://patchwork.kernel.org/cover/10921121/).
> > > >
> > > > > Just reading Documentation/power/suspend-and-interrupts.txt I'm led to
> > > > > believe that the cros_ec driver shouldn't call disable_irq() on the
> > > > > interrupt if it wants to wakeup from it:
> > > > >
> > > > > "Calling enable_irq_wake() causes suspend_device_irqs() to treat the
> > > > > given IRQ in a special way.  Namely, the IRQ remains enabled, by on the
> > > > > first interrupt it will be disabled, marked as pending and "suspended"
> > > > > so that it will be re-enabled by resume_device_irqs() during the
> > > > > subsequent system resume.  Also the PM core is notified about the event
> > > > > which causes the system suspend in progress to be aborted (that doesn't
> > > > > have to happen immediately, but at one of the points where the suspend
> > > > > thread looks for pending wakeup events)."
> > > >
> > > > I think this describes the behaviour when you keep both enabled.
> > > >
> > > > > I suppose the problem is an irq line disabled in hardware that has
> > > > > wakeup armed on it? Is this even valid? Shouldn't an irq be enabled for
> > > > > wakeup to work?
> > > >
> > > > I couldn't really find a definite answer, but there are a bunch of
> > > > examples of other drivers in the kernel:
> > > >  - drivers/extcon/extcon-usb-gpio.c:usb_extcon_suspend
> > > >  - drivers/hid/i2c-hid/i2c-hid.c:i2c_hid_suspend
> > > >  - drivers/mfd/max77843.c:max77843_suspend
> > > > (not exhaustive, this is quite hard to grep for...)
> > > >
> > > > > We could immediately unmask those lines in the hardware when the
> > > > > set_wake() callback is called. That way the genirq layer can use the
> > > > > driver to do what it wants with the hardware and the driver can make
> > > > > sure that set_wake() will always cause the wakeup interrupt to be
> > > > > delivered to genirq even when software has disabled it.
> > > > >
> > > > > But I think that there might be a problem with how genirq understands
> > > > > the masked state of a line when the wakeup implementation conflates
> > > > > masked state with wakeup armed state. Consider this call-flow:
> > > > >
> > > > >         irq masked in hardware, IRQD_IRQ_MASKED is set
> > > > >         enable_irq_wake()
> > > > >           unmask_irq in hardware
> > > > >         IRQD_WAKEUP_ARMED is set
> > > > >         <suspend and wakeup from irq>
> > > > >         handle_level_irq()
> > > > >           mask_ack_irq()
> > > > >             mask_irq()
> > > > >               if (irqd_irq_masked()) -> returns true and skips masking!
> > > > >             if (desc->irq_data.chip->irq_ack)
> > > > >               ...
> > > > >           irq_may_run()
> > > > >             irq_pm_check_wakeup()
> > > > >               irq_disable()
> > > > >                 mask_irq() -> does nothing again
> > > > >
> > > > > In the above flow, we never mask the irq because we thought it was
> > > > > already masked when it was disabled, but the irqchip implementation
> > > > > unmasked it to make wakeup work. Maybe we should always mask the irq if
> > > > > wakeup is armed and we're trying to call mask_irq()? Looks hacky.
> > Maybe we can implement irqchip's mask_ack_irq  in mediatek driver to
> > always mask the irq. Then flow will always call it without judgment
> > IRQD_IRQ_MASKED.
> >
> > diff --git a/drivers/pinctrl/mediatek/mtk-eint.c
> > b/drivers/pinctrl/mediatek/mtk-
> > index f464f8c..9f1aae2 100644
> > --- a/drivers/pinctrl/mediatek/mtk-eint.c
> > +++ b/drivers/pinctrl/mediatek/mtk-eint.c
> > @@ -272,12 +272,19 @@ static void mtk_eint_irq_release_resources(struct
> > irq_data
> >         gpiochip_unlock_as_irq(gpio_c, gpio_n);
> >  }
> >
> > +static void mtk_eint_mask_ack(struct irq_data *d)
> > +{
> > +       mtk_eint_mask(d);
> > +       mtk_eint_ack(d);
> > +}
> > +
> >  static struct irq_chip mtk_eint_irq_chip = {
> >         .name = "mt-eint",
> >         .irq_disable = mtk_eint_mask,
> >         .irq_mask = mtk_eint_mask,
> >         .irq_unmask = mtk_eint_unmask,
> >         .irq_ack = mtk_eint_ack,
> > +       .irq_mask_ack = mtk_eint_mask_ack,
> >         .irq_set_type = mtk_eint_set_type,
> >         .irq_set_wake = mtk_eint_irq_set_wake,
> >         .irq_request_resources = mtk_eint_irq_request_resources,
> >
> > This seems like a small change.
> > thanks.
> 
> Does this work? My understanding is that Linux thinks the irq is
> _already_ masked, so it short-circuits in the generic IRQ code and
> doesn't call mask again.
> -Evan

Yes, you are right.

The underlying problem is really that the hardware IRQ enabled state is
out of sync with what linux thinks.In resume flow,Linux thinks the irq
is _already_masked, so it short-circuits in the generic IRQ code and
doesn't call mask again.So in step 3 will have a interrupt storm.

But we implement irqchip's mask_ack_irq so that mask_ack_irq() calls
desc->irq_data.chip->irq_mask_ack instead of mask_irq() that needs to
judge IRQD_IRQ_MASKED. This will correctly set cur_mask[irq] = 0 to 
sync with kernel state.

Also, this patch can solve the issue of [1/2] in this patchset[1] which
also is the interrupt mask cannot be set correctly due to hardware irq
state not sync kernel.

[1] https://patchwork.kernel.org/patch/10921143/



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] pinctrl: mediatek: Update cur_mask in mask/mask ops
  2019-06-03  8:01                 ` Chuanjia Liu
@ 2019-06-04  0:57                   ` Nicolas Boichat
  2019-06-05  7:11                     ` Chuanjia Liu
  0 siblings, 1 reply; 18+ messages in thread
From: Nicolas Boichat @ 2019-06-04  0:57 UTC (permalink / raw)
  To: Chuanjia Liu
  Cc: Evan Green, Stephen Boyd,
	moderated list:ARM/Mediatek SoC support, Sean Wang,
	Linus Walleij, Matthias Brugger, open list:GPIO SUBSYSTEM,
	linux-arm Mailing List, lkml, Thomas Gleixner

On Mon, Jun 3, 2019 at 4:01 PM Chuanjia Liu <Chuanjia.Liu@mediatek.com> wrote:
>
> On Fri, 2019-05-31 at 10:17 -0700, Evan Green wrote:
> > On Fri, May 31, 2019 at 1:06 AM Chuanjia Liu <Chuanjia.Liu@mediatek.com> wrote:
> > >
> > > On Thu, 2019-05-30 at 10:12 -0700, Evan Green wrote:
> > > > On Wed, May 15, 2019 at 1:05 AM Nicolas Boichat <drinkcat@chromium.org> wrote:
> > > > >
> > > > > On Wed, May 15, 2019 at 4:14 AM Stephen Boyd <swboyd@chromium.org> wrote:
> > > > > >
> > > > > > Quoting Nicolas Boichat (2019-05-13 18:37:58)
> > > > > > > On Tue, May 14, 2019 at 6:29 AM Stephen Boyd <swboyd@chromium.org> wrote:
> > > > > > > >
> > > > > > > > Quoting Nicolas Boichat (2019-04-28 20:55:15)
> > > > > > > > > During suspend/resume, mtk_eint_mask may be called while
> > > > > > > > > wake_mask is active. For example, this happens if a wake-source
> > > > > > > > > with an active interrupt handler wakes the system:
> > > > > > > > > irq/pm.c:irq_pm_check_wakeup would disable the interrupt, so
> > > > > > > > > that it can be handled later on in the resume flow.
> > > > > > > > >
> > > > > > > > > However, this may happen before mtk_eint_do_resume is called:
> > > > > > > > > in this case, wake_mask is loaded, and cur_mask is restored
> > > > > > > > > from an older copy, re-enabling the interrupt, and causing
> > > > > > > > > an interrupt storm (especially for level interrupts).
> > > > > > > > >
> > > > > > > > > Instead, we just record mask/unmask changes in cur_mask. This
> > > > > > > > > also avoids the need to read the current mask in eint_do_suspend,
> > > > > > > > > and we can remove mtk_eint_chip_read_mask function.
> > > > > > > > >
> > > > > > > > > Signed-off-by: Nicolas Boichat <drinkcat@chromium.org>
> > > > > > > >
> > > > > > > > It looks an awful lot like you should just use IRQCHIP_MASK_ON_SUSPEND
> > > > > > > > here. Isn't that what's happening? All non-wake irqs should be masked at
> > > > > > > > the hardware level so they can't cause a wakeup during suspend and on
> > > > > > > > resume they can be unmasked?
> > > > > > >
> > > > > > > No, this is for an line that has both wake and interrupt enabled. To
> > > > > > > reword the commit message above:
> > > > > >
> > > > > > Is my understanding correct that there isn't a different "wake up"
> > > > > > register that can be written to cause a GPIO to be configured to wake
> > > > > > the system from suspend? The only way to do so is to leave the GPIO
> > > > > > unmasked in the hardware by having EINT_EN[irq] = 1? And thus any
> > > > > > interrupts that we don't want to wake us up during suspend should be
> > > > > > masked in the hardware?
> > > > >
> > > > > Yes, that's my understanding as well.
> > > > >
> > > > > And then, what this driver does is to emulate the behaviour of a
> > > > > controller that would actually have separate irq and wake enable
> > > > > registers.
> > > > >
> > > > > > If that's true, the code here that's trying to keep track of enabled
> > > > > > irqs and wakeup enabled irqs can be replaced with the irqchip flag so
> > > > > > that wakeup irqs are not masked while non-wakeups are masked.
> > > > >
> > > > > Correct, but with the caveat that I don't see anything that definitely
> > > > > requires an interrupt to be enabled to be a wake source. See below...
> > > > >
> > > > > >
> > > > > > >  1. cur_mask[irq] = 1; wake_mask[irq] = 1; EINT_EN[irq] = 1 (interrupt
> > > > > > > enabled at hardware level)
> > > > > > >  2. System suspends, resumes due to that line (at this stage EINT_HW
> > > > > > > == wake_mask)
> > > > > > >  3. irq_pm_check_wakeup is called, and disables the interrupt =>
> > > > > > > EINT_EN[irq] = 0, but we still have cur_mask[irq] = 1
> > > > > > >  4. mtk_eint_do_resume is called, and restores EINT_EN = cur_mask, so
> > > > > > > it reenables EINT_EN[irq] = 1 => interrupt storm.
> > > > > > >
> > > > > > > This patch fixes the issue in step 3. So that the interrupt can be
> > > > > > > re-enabled properly later on, sometimes after mtk_eint_do_resume, when
> > > > > > > the driver is ready to handle it.
> > > > > >
> > > > > > Right, we'd rather not see irqchip drivers working around the genirq
> > > > > > layer to do these things like tracking cur_mask and wake_mask. That
> > > > > > leads to subtle bugs and makes the driver maintain state across the
> > > > > > irqchip callbacks and system suspend/resume.
> > > > > >
> > > > > > >
> > > > > > > Also, IRQCHIP_MASK_ON_SUSPEND does not handle lines that are enabled
> > > > > > > as a wake source, but without interrupt enabled (e.g. cros_ec driver
> > > > > > > does that), which we do want to support.
> > > > > >
> > > > > > Hmm. I thought that even if the irq is disabled by a driver, that would
> > > > > > be a lazy disable so it isn't really masked in the hardware. Then if an
> > > > > > interrupt comes in during suspend on a wake configured irq line, the
> > > > > > hardware will have left it unmasked because IRQCHIP_MASK_ON_SUSPEND in
> > > > > > combination with lazy disable would mean that the line is left unmasked
> > > > > > (ignoring whatever this mediatek driver is doing to mask and unmask in
> > > > > > PM hooks).
> > > > >
> > > > > At the very least, that's not what happens with this system. The
> > > > > interrupt is definitely not kept enabled in suspend, and the system
> > > > > would not wake from an EC interrupt. (see also this series, BTW:
> > > > > https://patchwork.kernel.org/cover/10921121/).
> > > > >
> > > > > > Just reading Documentation/power/suspend-and-interrupts.txt I'm led to
> > > > > > believe that the cros_ec driver shouldn't call disable_irq() on the
> > > > > > interrupt if it wants to wakeup from it:
> > > > > >
> > > > > > "Calling enable_irq_wake() causes suspend_device_irqs() to treat the
> > > > > > given IRQ in a special way.  Namely, the IRQ remains enabled, by on the
> > > > > > first interrupt it will be disabled, marked as pending and "suspended"
> > > > > > so that it will be re-enabled by resume_device_irqs() during the
> > > > > > subsequent system resume.  Also the PM core is notified about the event
> > > > > > which causes the system suspend in progress to be aborted (that doesn't
> > > > > > have to happen immediately, but at one of the points where the suspend
> > > > > > thread looks for pending wakeup events)."
> > > > >
> > > > > I think this describes the behaviour when you keep both enabled.
> > > > >
> > > > > > I suppose the problem is an irq line disabled in hardware that has
> > > > > > wakeup armed on it? Is this even valid? Shouldn't an irq be enabled for
> > > > > > wakeup to work?
> > > > >
> > > > > I couldn't really find a definite answer, but there are a bunch of
> > > > > examples of other drivers in the kernel:
> > > > >  - drivers/extcon/extcon-usb-gpio.c:usb_extcon_suspend
> > > > >  - drivers/hid/i2c-hid/i2c-hid.c:i2c_hid_suspend
> > > > >  - drivers/mfd/max77843.c:max77843_suspend
> > > > > (not exhaustive, this is quite hard to grep for...)
> > > > >
> > > > > > We could immediately unmask those lines in the hardware when the
> > > > > > set_wake() callback is called. That way the genirq layer can use the
> > > > > > driver to do what it wants with the hardware and the driver can make
> > > > > > sure that set_wake() will always cause the wakeup interrupt to be
> > > > > > delivered to genirq even when software has disabled it.
> > > > > >
> > > > > > But I think that there might be a problem with how genirq understands
> > > > > > the masked state of a line when the wakeup implementation conflates
> > > > > > masked state with wakeup armed state. Consider this call-flow:
> > > > > >
> > > > > >         irq masked in hardware, IRQD_IRQ_MASKED is set
> > > > > >         enable_irq_wake()
> > > > > >           unmask_irq in hardware
> > > > > >         IRQD_WAKEUP_ARMED is set
> > > > > >         <suspend and wakeup from irq>
> > > > > >         handle_level_irq()
> > > > > >           mask_ack_irq()
> > > > > >             mask_irq()
> > > > > >               if (irqd_irq_masked()) -> returns true and skips masking!
> > > > > >             if (desc->irq_data.chip->irq_ack)
> > > > > >               ...
> > > > > >           irq_may_run()
> > > > > >             irq_pm_check_wakeup()
> > > > > >               irq_disable()
> > > > > >                 mask_irq() -> does nothing again
> > > > > >
> > > > > > In the above flow, we never mask the irq because we thought it was
> > > > > > already masked when it was disabled, but the irqchip implementation
> > > > > > unmasked it to make wakeup work. Maybe we should always mask the irq if
> > > > > > wakeup is armed and we're trying to call mask_irq()? Looks hacky.
> > > Maybe we can implement irqchip's mask_ack_irq  in mediatek driver to
> > > always mask the irq. Then flow will always call it without judgment
> > > IRQD_IRQ_MASKED.
> > >
> > > diff --git a/drivers/pinctrl/mediatek/mtk-eint.c
> > > b/drivers/pinctrl/mediatek/mtk-
> > > index f464f8c..9f1aae2 100644
> > > --- a/drivers/pinctrl/mediatek/mtk-eint.c
> > > +++ b/drivers/pinctrl/mediatek/mtk-eint.c
> > > @@ -272,12 +272,19 @@ static void mtk_eint_irq_release_resources(struct
> > > irq_data
> > >         gpiochip_unlock_as_irq(gpio_c, gpio_n);
> > >  }
> > >
> > > +static void mtk_eint_mask_ack(struct irq_data *d)
> > > +{
> > > +       mtk_eint_mask(d);
> > > +       mtk_eint_ack(d);
> > > +}
> > > +
> > >  static struct irq_chip mtk_eint_irq_chip = {
> > >         .name = "mt-eint",
> > >         .irq_disable = mtk_eint_mask,
> > >         .irq_mask = mtk_eint_mask,
> > >         .irq_unmask = mtk_eint_unmask,
> > >         .irq_ack = mtk_eint_ack,
> > > +       .irq_mask_ack = mtk_eint_mask_ack,
> > >         .irq_set_type = mtk_eint_set_type,
> > >         .irq_set_wake = mtk_eint_irq_set_wake,
> > >         .irq_request_resources = mtk_eint_irq_request_resources,
> > >
> > > This seems like a small change.
> > > thanks.
> >
> > Does this work? My understanding is that Linux thinks the irq is
> > _already_ masked, so it short-circuits in the generic IRQ code and
> > doesn't call mask again.
> > -Evan
>
> Yes, you are right.
>
> The underlying problem is really that the hardware IRQ enabled state is
> out of sync with what linux thinks.In resume flow,Linux thinks the irq
> is _already_masked, so it short-circuits in the generic IRQ code and
> doesn't call mask again.So in step 3 will have a interrupt storm.
>
> But we implement irqchip's mask_ack_irq so that mask_ack_irq() calls
> desc->irq_data.chip->irq_mask_ack instead of mask_irq() that needs to
> judge IRQD_IRQ_MASKED. This will correctly set cur_mask[irq] = 0 to
> sync with kernel state.

Oh, I see. I guess that would work as mask_ack_irq does not check for
current masked status before calling irq_chip->irq_mask_ack... But
that's only if irq_chip->irq_mask_ack is defined, else it just calls
mask_irq which does that check.

But if we follow bf22ff45bed ("genirq: Avoid unnecessary low level irq
function calls"), this is actually a bug and we should add that same
check to mask_ack_irq (and since pinctrl-rockchip.c does not implement
irq_mask_ack, it's understandable why the author missed that).

So I'm not sure if this is the right change either...

> Also, this patch can solve the issue of [1/2] in this patchset[1] which
> also is the interrupt mask cannot be set correctly due to hardware irq
> state not sync kernel.
>
> [1] https://patchwork.kernel.org/patch/10921143/
>
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] pinctrl: mediatek: Update cur_mask in mask/mask ops
  2019-06-04  0:57                   ` Nicolas Boichat
@ 2019-06-05  7:11                     ` Chuanjia Liu
  0 siblings, 0 replies; 18+ messages in thread
From: Chuanjia Liu @ 2019-06-05  7:11 UTC (permalink / raw)
  To: Nicolas Boichat, Sean Wang
  Cc: Evan Green, Stephen Boyd,
	moderated list:ARM/Mediatek SoC support, Sean Wang,
	Linus Walleij, Matthias Brugger, open list:GPIO SUBSYSTEM,
	linux-arm Mailing List, lkml, Thomas Gleixner

On Tue, 2019-06-04 at 08:57 +0800, Nicolas Boichat wrote:
> On Mon, Jun 3, 2019 at 4:01 PM Chuanjia Liu <Chuanjia.Liu@mediatek.com> wrote:
> >
> > On Fri, 2019-05-31 at 10:17 -0700, Evan Green wrote:
> > > On Fri, May 31, 2019 at 1:06 AM Chuanjia Liu <Chuanjia.Liu@mediatek.com> wrote:
> > > >
> > > > On Thu, 2019-05-30 at 10:12 -0700, Evan Green wrote:
> > > > > On Wed, May 15, 2019 at 1:05 AM Nicolas Boichat <drinkcat@chromium.org> wrote:
> > > > > >
> > > > > > On Wed, May 15, 2019 at 4:14 AM Stephen Boyd <swboyd@chromium.org> wrote:
> > > > > > >
> > > > > > > Quoting Nicolas Boichat (2019-05-13 18:37:58)
> > > > > > > > On Tue, May 14, 2019 at 6:29 AM Stephen Boyd <swboyd@chromium.org> wrote:
> > > > > > > > >
> > > > > > > > > Quoting Nicolas Boichat (2019-04-28 20:55:15)
> > > > > > > > > > During suspend/resume, mtk_eint_mask may be called while
> > > > > > > > > > wake_mask is active. For example, this happens if a wake-source
> > > > > > > > > > with an active interrupt handler wakes the system:
> > > > > > > > > > irq/pm.c:irq_pm_check_wakeup would disable the interrupt, so
> > > > > > > > > > that it can be handled later on in the resume flow.
> > > > > > > > > >
> > > > > > > > > > However, this may happen before mtk_eint_do_resume is called:
> > > > > > > > > > in this case, wake_mask is loaded, and cur_mask is restored
> > > > > > > > > > from an older copy, re-enabling the interrupt, and causing
> > > > > > > > > > an interrupt storm (especially for level interrupts).
> > > > > > > > > >
> > > > > > > > > > Instead, we just record mask/unmask changes in cur_mask. This
> > > > > > > > > > also avoids the need to read the current mask in eint_do_suspend,
> > > > > > > > > > and we can remove mtk_eint_chip_read_mask function.
> > > > > > > > > >
> > > > > > > > > > Signed-off-by: Nicolas Boichat <drinkcat@chromium.org>
> > > > > > > > >
> > > > > > > > > It looks an awful lot like you should just use IRQCHIP_MASK_ON_SUSPEND
> > > > > > > > > here. Isn't that what's happening? All non-wake irqs should be masked at
> > > > > > > > > the hardware level so they can't cause a wakeup during suspend and on
> > > > > > > > > resume they can be unmasked?
> > > > > > > >
> > > > > > > > No, this is for an line that has both wake and interrupt enabled. To
> > > > > > > > reword the commit message above:
> > > > > > >
> > > > > > > Is my understanding correct that there isn't a different "wake up"
> > > > > > > register that can be written to cause a GPIO to be configured to wake
> > > > > > > the system from suspend? The only way to do so is to leave the GPIO
> > > > > > > unmasked in the hardware by having EINT_EN[irq] = 1? And thus any
> > > > > > > interrupts that we don't want to wake us up during suspend should be
> > > > > > > masked in the hardware?
> > > > > >
> > > > > > Yes, that's my understanding as well.
> > > > > >
> > > > > > And then, what this driver does is to emulate the behaviour of a
> > > > > > controller that would actually have separate irq and wake enable
> > > > > > registers.
> > > > > >
> > > > > > > If that's true, the code here that's trying to keep track of enabled
> > > > > > > irqs and wakeup enabled irqs can be replaced with the irqchip flag so
> > > > > > > that wakeup irqs are not masked while non-wakeups are masked.
> > > > > >
> > > > > > Correct, but with the caveat that I don't see anything that definitely
> > > > > > requires an interrupt to be enabled to be a wake source. See below...
> > > > > >
> > > > > > >
> > > > > > > >  1. cur_mask[irq] = 1; wake_mask[irq] = 1; EINT_EN[irq] = 1 (interrupt
> > > > > > > > enabled at hardware level)
> > > > > > > >  2. System suspends, resumes due to that line (at this stage EINT_HW
> > > > > > > > == wake_mask)
> > > > > > > >  3. irq_pm_check_wakeup is called, and disables the interrupt =>
> > > > > > > > EINT_EN[irq] = 0, but we still have cur_mask[irq] = 1
> > > > > > > >  4. mtk_eint_do_resume is called, and restores EINT_EN = cur_mask, so
> > > > > > > > it reenables EINT_EN[irq] = 1 => interrupt storm.
> > > > > > > >
> > > > > > > > This patch fixes the issue in step 3. So that the interrupt can be
> > > > > > > > re-enabled properly later on, sometimes after mtk_eint_do_resume, when
> > > > > > > > the driver is ready to handle it.
> > > > > > >
> > > > > > > Right, we'd rather not see irqchip drivers working around the genirq
> > > > > > > layer to do these things like tracking cur_mask and wake_mask. That
> > > > > > > leads to subtle bugs and makes the driver maintain state across the
> > > > > > > irqchip callbacks and system suspend/resume.
> > > > > > >
> > > > > > > >
> > > > > > > > Also, IRQCHIP_MASK_ON_SUSPEND does not handle lines that are enabled
> > > > > > > > as a wake source, but without interrupt enabled (e.g. cros_ec driver
> > > > > > > > does that), which we do want to support.
> > > > > > >
> > > > > > > Hmm. I thought that even if the irq is disabled by a driver, that would
> > > > > > > be a lazy disable so it isn't really masked in the hardware. Then if an
> > > > > > > interrupt comes in during suspend on a wake configured irq line, the
> > > > > > > hardware will have left it unmasked because IRQCHIP_MASK_ON_SUSPEND in
> > > > > > > combination with lazy disable would mean that the line is left unmasked
> > > > > > > (ignoring whatever this mediatek driver is doing to mask and unmask in
> > > > > > > PM hooks).
> > > > > >
> > > > > > At the very least, that's not what happens with this system. The
> > > > > > interrupt is definitely not kept enabled in suspend, and the system
> > > > > > would not wake from an EC interrupt. (see also this series, BTW:
> > > > > > https://patchwork.kernel.org/cover/10921121/).
> > > > > >
> > > > > > > Just reading Documentation/power/suspend-and-interrupts.txt I'm led to
> > > > > > > believe that the cros_ec driver shouldn't call disable_irq() on the
> > > > > > > interrupt if it wants to wakeup from it:
> > > > > > >
> > > > > > > "Calling enable_irq_wake() causes suspend_device_irqs() to treat the
> > > > > > > given IRQ in a special way.  Namely, the IRQ remains enabled, by on the
> > > > > > > first interrupt it will be disabled, marked as pending and "suspended"
> > > > > > > so that it will be re-enabled by resume_device_irqs() during the
> > > > > > > subsequent system resume.  Also the PM core is notified about the event
> > > > > > > which causes the system suspend in progress to be aborted (that doesn't
> > > > > > > have to happen immediately, but at one of the points where the suspend
> > > > > > > thread looks for pending wakeup events)."
> > > > > >
> > > > > > I think this describes the behaviour when you keep both enabled.
> > > > > >
> > > > > > > I suppose the problem is an irq line disabled in hardware that has
> > > > > > > wakeup armed on it? Is this even valid? Shouldn't an irq be enabled for
> > > > > > > wakeup to work?
> > > > > >
> > > > > > I couldn't really find a definite answer, but there are a bunch of
> > > > > > examples of other drivers in the kernel:
> > > > > >  - drivers/extcon/extcon-usb-gpio.c:usb_extcon_suspend
> > > > > >  - drivers/hid/i2c-hid/i2c-hid.c:i2c_hid_suspend
> > > > > >  - drivers/mfd/max77843.c:max77843_suspend
> > > > > > (not exhaustive, this is quite hard to grep for...)
> > > > > >
> > > > > > > We could immediately unmask those lines in the hardware when the
> > > > > > > set_wake() callback is called. That way the genirq layer can use the
> > > > > > > driver to do what it wants with the hardware and the driver can make
> > > > > > > sure that set_wake() will always cause the wakeup interrupt to be
> > > > > > > delivered to genirq even when software has disabled it.
> > > > > > >
> > > > > > > But I think that there might be a problem with how genirq understands
> > > > > > > the masked state of a line when the wakeup implementation conflates
> > > > > > > masked state with wakeup armed state. Consider this call-flow:
> > > > > > >
> > > > > > >         irq masked in hardware, IRQD_IRQ_MASKED is set
> > > > > > >         enable_irq_wake()
> > > > > > >           unmask_irq in hardware
> > > > > > >         IRQD_WAKEUP_ARMED is set
> > > > > > >         <suspend and wakeup from irq>
> > > > > > >         handle_level_irq()
> > > > > > >           mask_ack_irq()
> > > > > > >             mask_irq()
> > > > > > >               if (irqd_irq_masked()) -> returns true and skips masking!
> > > > > > >             if (desc->irq_data.chip->irq_ack)
> > > > > > >               ...
> > > > > > >           irq_may_run()
> > > > > > >             irq_pm_check_wakeup()
> > > > > > >               irq_disable()
> > > > > > >                 mask_irq() -> does nothing again
> > > > > > >
> > > > > > > In the above flow, we never mask the irq because we thought it was
> > > > > > > already masked when it was disabled, but the irqchip implementation
> > > > > > > unmasked it to make wakeup work. Maybe we should always mask the irq if
> > > > > > > wakeup is armed and we're trying to call mask_irq()? Looks hacky.
> > > > Maybe we can implement irqchip's mask_ack_irq  in mediatek driver to
> > > > always mask the irq. Then flow will always call it without judgment
> > > > IRQD_IRQ_MASKED.
> > > >
> > > > diff --git a/drivers/pinctrl/mediatek/mtk-eint.c
> > > > b/drivers/pinctrl/mediatek/mtk-
> > > > index f464f8c..9f1aae2 100644
> > > > --- a/drivers/pinctrl/mediatek/mtk-eint.c
> > > > +++ b/drivers/pinctrl/mediatek/mtk-eint.c
> > > > @@ -272,12 +272,19 @@ static void mtk_eint_irq_release_resources(struct
> > > > irq_data
> > > >         gpiochip_unlock_as_irq(gpio_c, gpio_n);
> > > >  }
> > > >
> > > > +static void mtk_eint_mask_ack(struct irq_data *d)
> > > > +{
> > > > +       mtk_eint_mask(d);
> > > > +       mtk_eint_ack(d);
> > > > +}
> > > > +
> > > >  static struct irq_chip mtk_eint_irq_chip = {
> > > >         .name = "mt-eint",
> > > >         .irq_disable = mtk_eint_mask,
> > > >         .irq_mask = mtk_eint_mask,
> > > >         .irq_unmask = mtk_eint_unmask,
> > > >         .irq_ack = mtk_eint_ack,
> > > > +       .irq_mask_ack = mtk_eint_mask_ack,
> > > >         .irq_set_type = mtk_eint_set_type,
> > > >         .irq_set_wake = mtk_eint_irq_set_wake,
> > > >         .irq_request_resources = mtk_eint_irq_request_resources,
> > > >
> > > > This seems like a small change.
> > > > thanks.
> > >
> > > Does this work? My understanding is that Linux thinks the irq is
> > > _already_ masked, so it short-circuits in the generic IRQ code and
> > > doesn't call mask again.
> > > -Evan
> >
> > Yes, you are right.
> >
> > The underlying problem is really that the hardware IRQ enabled state is
> > out of sync with what linux thinks.In resume flow,Linux thinks the irq
> > is _already_masked, so it short-circuits in the generic IRQ code and
> > doesn't call mask again.So in step 3 will have a interrupt storm.
> >
> > But we implement irqchip's mask_ack_irq so that mask_ack_irq() calls
> > desc->irq_data.chip->irq_mask_ack instead of mask_irq() that needs to
> > judge IRQD_IRQ_MASKED. This will correctly set cur_mask[irq] = 0 to
> > sync with kernel state.
> 
> Oh, I see. I guess that would work as mask_ack_irq does not check for
> current masked status before calling irq_chip->irq_mask_ack... But
> that's only if irq_chip->irq_mask_ack is defined, else it just calls
> mask_irq which does that check.
> 
yes,so we define irq_chip->irq_mask_ack to avoid check.

> But if we follow bf22ff45bed ("genirq: Avoid unnecessary low level irq
> function calls"), this is actually a bug and we should add that same
> check to mask_ack_irq (and since pinctrl-rockchip.c does not implement
> irq_mask_ack, it's understandable why the author missed that).
> 
If we added the same check to mask_ack_irq,it might cause ack_irq to
fail,because ack_irq should be executed regardless of state of the mask.
If we want avoid unnecessary low level irq function calls, we can choose
not define irq_chip->irq_mask_ack.In some application scenarios, taken
here, we can use this to call low level irq function to mask hwirq.

I'm also not sure what's the best approach here.
@sean,could you help comment this issue?
> So I'm not sure if this is the right change either...
> 
> > Also, this patch can solve the issue of [1/2] in this patchset[1] which
> > also is the interrupt mask cannot be set correctly due to hardware irq
> > state not sync kernel.
> >
> > [1] https://patchwork.kernel.org/patch/10921143/
> >
> >



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] pinctrl: mediatek: Update cur_mask in mask/mask ops
  2019-05-30 17:12           ` Evan Green
  2019-05-31  8:05             ` Chuanjia Liu
@ 2019-06-10 22:25             ` Stephen Boyd
  1 sibling, 0 replies; 18+ messages in thread
From: Stephen Boyd @ 2019-06-10 22:25 UTC (permalink / raw)
  To: Evan Green, Nicolas Boichat
  Cc: moderated list:ARM/Mediatek SoC support, Sean Wang,
	Linus Walleij, Matthias Brugger, linux-gpio,
	linux-arm Mailing List, lkml, Chuanjia Liu, Thomas Gleixner

Quoting Evan Green (2019-05-30 10:12:03)
> On Wed, May 15, 2019 at 1:05 AM Nicolas Boichat <drinkcat@chromium.org> wrote:
> >
> > On Wed, May 15, 2019 at 4:14 AM Stephen Boyd <swboyd@chromium.org> wrote:
> >
> > > We could immediately unmask those lines in the hardware when the
> > > set_wake() callback is called. That way the genirq layer can use the
> > > driver to do what it wants with the hardware and the driver can make
> > > sure that set_wake() will always cause the wakeup interrupt to be
> > > delivered to genirq even when software has disabled it.
> > >
> > > But I think that there might be a problem with how genirq understands
> > > the masked state of a line when the wakeup implementation conflates
> > > masked state with wakeup armed state. Consider this call-flow:
> > >
> > >         irq masked in hardware, IRQD_IRQ_MASKED is set
> > >         enable_irq_wake()
> > >           unmask_irq in hardware
> > >         IRQD_WAKEUP_ARMED is set
> > >         <suspend and wakeup from irq>
> > >         handle_level_irq()
> > >           mask_ack_irq()
> > >             mask_irq()
> > >               if (irqd_irq_masked()) -> returns true and skips masking!
> > >             if (desc->irq_data.chip->irq_ack)
> > >               ...
> > >           irq_may_run()
> > >             irq_pm_check_wakeup()
> > >               irq_disable()
> > >                 mask_irq() -> does nothing again
> > >
> > > In the above flow, we never mask the irq because we thought it was
> > > already masked when it was disabled, but the irqchip implementation
> > > unmasked it to make wakeup work. Maybe we should always mask the irq if
> > > wakeup is armed and we're trying to call mask_irq()? Looks hacky.
> > >
> > > diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
> > > index 51128bea3846..20257d528880 100644
> > > --- a/kernel/irq/chip.c
> > > +++ b/kernel/irq/chip.c
> > > @@ -411,7 +411,7 @@ static inline void mask_ack_irq(struct irq_desc *desc)
> > >
> > >  void mask_irq(struct irq_desc *desc)
> > >  {
> > > -       if (irqd_irq_masked(&desc->irq_data))
> > > +       if (!irqd_is_wakeup_armed(&desc->irq_data) && irqd_irq_masked(&desc->irq_data))
> > >                 return;
> > >
> > >         if (desc->irq_data.chip->irq_mask) {
> >
> > I'm... really not sure what's the best approach here. But basically,
> > yes, if we can find a way to properly handle wake and interrupt
> > behaviour for drivers with a single mask, that'd be good.
> > IRQCHIP_MASK_ON_SUSPEND only seems to be doing half of the work, since
> > it does not cover the disable+wake source case.
> >
> > Thanks,
> 
> I finally got around to studying this patch. This series seems okay to
> me. The underlying problem is really that the hardware IRQ enabled
> state is out of sync with what Linux thinks. This happens during
> suspend because Linux thinks the irq is disabled, but due to the
> hardware constraints on this platform, the interrupt has to be enabled
> for it to be a wake source. So the mtk driver re-enables the
> interrupt, and then has to find a way to get back in sync with Linux's
> IRQ mask state.
> 
> One possible approach is mentioned above by Stephen: stop calling
> disable_irq in the cros EC driver. Then both linux and mtk agree the
> interrupt is enabled at suspend time. I think this ran into other
> problems though, where the EC gets its interrupt but is unable to
> silence it because the underlying SPI bus is still suspended.

Does this happen? I thought the interrupt would only be delivered once
all drivers have resumed from the noirq resume phase. Maybe the SPI
controller needs to resume there instead of in the normal resume path
and then this isn't a problem.

> 
> The other approach, taken here, is to mask the interrupt when it first
> comes in, getting Linux and mtk back in agreement that yes, the
> interrupt is masked. Outside of enlightening the generic IRQ core
> about these types of interrupts that need to get re-enabled to be wake
> sources, this seems like a reasonable approach.

I prefer we teach the genirq layer about these types of irqchips. That
way the solution is common and not a one-off fix for Mediatek chips. I'm
sure that this same problem will come up again for another SoC vendor
out there so having a chip flag that does the other half of
IRQCHIP_MASK_ON_SUSPEND would be appropriate.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] pinctrl: mediatek: Update cur_mask in mask/mask ops
  2019-04-29  3:55 ` [PATCH 2/2] pinctrl: mediatek: Update cur_mask in mask/mask ops Nicolas Boichat
  2019-05-13 22:29   ` Stephen Boyd
@ 2019-06-21  4:20   ` Sean Wang
  2019-06-25 13:52     ` Linus Walleij
  1 sibling, 1 reply; 18+ messages in thread
From: Sean Wang @ 2019-06-21  4:20 UTC (permalink / raw)
  To: Nicolas Boichat
  Cc: moderated list:ARM/Mediatek SoC support, Linus Walleij,
	Matthias Brugger, open list:GPIO SUBSYSTEM,
	linux-arm Mailing List, lkml, Chuanjia Liu, Evan Green,
	Stephen Boyd

Hi, Nicolas

On Sun, Apr 28, 2019 at 8:55 PM Nicolas Boichat <drinkcat@chromium.org> wrote:
>
> During suspend/resume, mtk_eint_mask may be called while
> wake_mask is active. For example, this happens if a wake-source
> with an active interrupt handler wakes the system:
> irq/pm.c:irq_pm_check_wakeup would disable the interrupt, so
> that it can be handled later on in the resume flow.
>
> However, this may happen before mtk_eint_do_resume is called:
> in this case, wake_mask is loaded, and cur_mask is restored
> from an older copy, re-enabling the interrupt, and causing
> an interrupt storm (especially for level interrupts).
>
> Instead, we just record mask/unmask changes in cur_mask. This
> also avoids the need to read the current mask in eint_do_suspend,
> and we can remove mtk_eint_chip_read_mask function.
>

The change is worth rewording the commit message you added above as an instance
and adding Fixes tag as a fixup to mean you're fixing the existing
problem in the driver.

And then Acked-by: Sean Wang <sean.wang@kernel.org>

> Signed-off-by: Nicolas Boichat <drinkcat@chromium.org>
> ---
>  drivers/pinctrl/mediatek/mtk-eint.c | 18 ++++--------------
>  1 file changed, 4 insertions(+), 14 deletions(-)
>
> diff --git a/drivers/pinctrl/mediatek/mtk-eint.c b/drivers/pinctrl/mediatek/mtk-eint.c
> index 737385e86beb807..7e526bcf5e0b55c 100644
> --- a/drivers/pinctrl/mediatek/mtk-eint.c
> +++ b/drivers/pinctrl/mediatek/mtk-eint.c
> @@ -113,6 +113,8 @@ static void mtk_eint_mask(struct irq_data *d)
>         void __iomem *reg = mtk_eint_get_offset(eint, d->hwirq,
>                                                 eint->regs->mask_set);
>
> +       eint->cur_mask[d->hwirq >> 5] &= ~mask;
> +
>         writel(mask, reg);
>  }
>
> @@ -123,6 +125,8 @@ static void mtk_eint_unmask(struct irq_data *d)
>         void __iomem *reg = mtk_eint_get_offset(eint, d->hwirq,
>                                                 eint->regs->mask_clr);
>
> +       eint->cur_mask[d->hwirq >> 5] |= mask;
> +
>         writel(mask, reg);
>
>         if (eint->dual_edge[d->hwirq])
> @@ -217,19 +221,6 @@ static void mtk_eint_chip_write_mask(const struct mtk_eint *eint,
>         }
>  }
>
> -static void mtk_eint_chip_read_mask(const struct mtk_eint *eint,
> -                                   void __iomem *base, u32 *buf)
> -{
> -       int port;
> -       void __iomem *reg;
> -
> -       for (port = 0; port < eint->hw->ports; port++) {
> -               reg = base + eint->regs->mask + (port << 2);
> -               buf[port] = ~readl_relaxed(reg);
> -               /* Mask is 0 when irq is enabled, and 1 when disabled. */
> -       }
> -}
> -
>  static int mtk_eint_irq_request_resources(struct irq_data *d)
>  {
>         struct mtk_eint *eint = irq_data_get_irq_chip_data(d);
> @@ -384,7 +375,6 @@ static void mtk_eint_irq_handler(struct irq_desc *desc)
>
>  int mtk_eint_do_suspend(struct mtk_eint *eint)
>  {
> -       mtk_eint_chip_read_mask(eint, eint->base, eint->cur_mask);
>         mtk_eint_chip_write_mask(eint, eint->base, eint->wake_mask);
>
>         return 0;
> --
> 2.21.0.593.g511ec345e18-goog
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] pinctrl: mediatek: Ignore interrupts that are wake only during resume
  2019-04-29  3:55 ` [PATCH 1/2] pinctrl: mediatek: Ignore interrupts that are wake only during resume Nicolas Boichat
@ 2019-06-21  4:47   ` Sean Wang
  2019-06-25 13:50   ` Linus Walleij
  1 sibling, 0 replies; 18+ messages in thread
From: Sean Wang @ 2019-06-21  4:47 UTC (permalink / raw)
  To: Nicolas Boichat
  Cc: moderated list:ARM/Mediatek SoC support, Linus Walleij,
	Matthias Brugger, open list:GPIO SUBSYSTEM,
	linux-arm Mailing List, lkml, Chuanjia Liu, Evan Green,
	Stephen Boyd

Hi Nicolas,

On Sun, Apr 28, 2019 at 8:55 PM Nicolas Boichat <drinkcat@chromium.org> wrote:
>
> Before suspending, mtk-eint would set the interrupt mask to the
> one in wake_mask. However, some of these interrupts may not have a
> corresponding interrupt handler, or the interrupt may be disabled.
>
> On resume, the eint irq handler would trigger nevertheless,
> and irq/pm.c:irq_pm_check_wakeup would be called, which would
> try to call irq_disable. However, if the interrupt is not enabled
> (irqd_irq_disabled(&desc->irq_data) is true), the call does nothing,
> and the interrupt is left enabled in the eint driver.
>
> Especially for level-sensitive interrupts, this will lead to an
> interrupt storm on resume.
>
> If we detect that an interrupt is only in wake_mask, but not in
> cur_mask, we can just mask it out immediately (as mtk_eint_resume
> would do anyway at a later stage in the resume sequence, when
> restoring cur_mask).
>
> Fixes: bf22ff45bed ("genirq: Avoid unnecessary low level irq function calls")
> Signed-off-by: Nicolas Boichat <drinkcat@chromium.org>

Acked-by: Sean Wang <sean.wang@kernel.org>

> ---
>  drivers/pinctrl/mediatek/mtk-eint.c | 16 +++++++++++++++-
>  1 file changed, 15 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/pinctrl/mediatek/mtk-eint.c b/drivers/pinctrl/mediatek/mtk-eint.c
> index f464f8cd274b75c..737385e86beb807 100644
> --- a/drivers/pinctrl/mediatek/mtk-eint.c
> +++ b/drivers/pinctrl/mediatek/mtk-eint.c
> @@ -318,7 +318,7 @@ static void mtk_eint_irq_handler(struct irq_desc *desc)
>         struct irq_chip *chip = irq_desc_get_chip(desc);
>         struct mtk_eint *eint = irq_desc_get_handler_data(desc);
>         unsigned int status, eint_num;
> -       int offset, index, virq;
> +       int offset, mask_offset, index, virq;
>         void __iomem *reg =  mtk_eint_get_offset(eint, 0, eint->regs->stat);
>         int dual_edge, start_level, curr_level;
>
> @@ -328,10 +328,24 @@ static void mtk_eint_irq_handler(struct irq_desc *desc)
>                 status = readl(reg);
>                 while (status) {
>                         offset = __ffs(status);
> +                       mask_offset = eint_num >> 5;
>                         index = eint_num + offset;
>                         virq = irq_find_mapping(eint->domain, index);
>                         status &= ~BIT(offset);
>
> +                       /*
> +                        * If we get an interrupt on pin that was only required
> +                        * for wake (but no real interrupt requested), mask the
> +                        * interrupt (as would mtk_eint_resume do anyway later
> +                        * in the resume sequence).
> +                        */
> +                       if (eint->wake_mask[mask_offset] & BIT(offset) &&
> +                           !(eint->cur_mask[mask_offset] & BIT(offset))) {
> +                               writel_relaxed(BIT(offset), reg -
> +                                       eint->regs->stat +
> +                                       eint->regs->mask_set);
> +                       }
> +
>                         dual_edge = eint->dual_edge[index];
>                         if (dual_edge) {
>                                 /*
> --
> 2.21.0.593.g511ec345e18-goog
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] pinctrl: mediatek: Ignore interrupts that are wake only during resume
  2019-04-29  3:55 ` [PATCH 1/2] pinctrl: mediatek: Ignore interrupts that are wake only during resume Nicolas Boichat
  2019-06-21  4:47   ` Sean Wang
@ 2019-06-25 13:50   ` Linus Walleij
  1 sibling, 0 replies; 18+ messages in thread
From: Linus Walleij @ 2019-06-25 13:50 UTC (permalink / raw)
  To: Nicolas Boichat
  Cc: moderated list:ARM/Mediatek SoC support, Sean Wang,
	Matthias Brugger, open list:GPIO SUBSYSTEM, Linux ARM,
	linux-kernel, Chuanjia Liu, Evan Green, Stephen Boyd

On Mon, Apr 29, 2019 at 5:55 AM Nicolas Boichat <drinkcat@chromium.org> wrote:

> Before suspending, mtk-eint would set the interrupt mask to the
> one in wake_mask. However, some of these interrupts may not have a
> corresponding interrupt handler, or the interrupt may be disabled.
>
> On resume, the eint irq handler would trigger nevertheless,
> and irq/pm.c:irq_pm_check_wakeup would be called, which would
> try to call irq_disable. However, if the interrupt is not enabled
> (irqd_irq_disabled(&desc->irq_data) is true), the call does nothing,
> and the interrupt is left enabled in the eint driver.
>
> Especially for level-sensitive interrupts, this will lead to an
> interrupt storm on resume.
>
> If we detect that an interrupt is only in wake_mask, but not in
> cur_mask, we can just mask it out immediately (as mtk_eint_resume
> would do anyway at a later stage in the resume sequence, when
> restoring cur_mask).
>
> Fixes: bf22ff45bed ("genirq: Avoid unnecessary low level irq function calls")
> Signed-off-by: Nicolas Boichat <drinkcat@chromium.org>

Patch applied for fixes.
Hm a late ACK made me miss this, sorry.

Yours,
Linus Walleij

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] pinctrl: mediatek: Update cur_mask in mask/mask ops
  2019-06-21  4:20   ` Sean Wang
@ 2019-06-25 13:52     ` Linus Walleij
  0 siblings, 0 replies; 18+ messages in thread
From: Linus Walleij @ 2019-06-25 13:52 UTC (permalink / raw)
  To: Sean Wang
  Cc: Nicolas Boichat, moderated list:ARM/Mediatek SoC support,
	Matthias Brugger, open list:GPIO SUBSYSTEM,
	linux-arm Mailing List, lkml, Chuanjia Liu, Evan Green,
	Stephen Boyd

On Fri, Jun 21, 2019 at 6:20 AM Sean Wang <sean.wang@kernel.org> wrote:

> The change is worth rewording the commit message you added above as an instance
> and adding Fixes tag as a fixup to mean you're fixing the existing
> problem in the driver.
>
> And then Acked-by: Sean Wang <sean.wang@kernel.org>

I'm waiting for an updated version of this patch.

Yours,
Linus Walleij

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2019-06-25 13:52 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-29  3:55 [PATCH 0/2] pinctrl: mediatek: Fix 2 issues related to resume from wake sources Nicolas Boichat
2019-04-29  3:55 ` [PATCH 1/2] pinctrl: mediatek: Ignore interrupts that are wake only during resume Nicolas Boichat
2019-06-21  4:47   ` Sean Wang
2019-06-25 13:50   ` Linus Walleij
2019-04-29  3:55 ` [PATCH 2/2] pinctrl: mediatek: Update cur_mask in mask/mask ops Nicolas Boichat
2019-05-13 22:29   ` Stephen Boyd
2019-05-14  1:37     ` Nicolas Boichat
2019-05-14 20:14       ` Stephen Boyd
2019-05-15  8:04         ` Nicolas Boichat
2019-05-30 17:12           ` Evan Green
2019-05-31  8:05             ` Chuanjia Liu
2019-05-31 17:17               ` Evan Green
2019-06-03  8:01                 ` Chuanjia Liu
2019-06-04  0:57                   ` Nicolas Boichat
2019-06-05  7:11                     ` Chuanjia Liu
2019-06-10 22:25             ` Stephen Boyd
2019-06-21  4:20   ` Sean Wang
2019-06-25 13:52     ` Linus Walleij

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).