linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3] genirq: Check irq disabled & masked states in irq_shutdown
@ 2017-05-27 10:05 Jeffy Chen
  2017-05-27 11:12 ` Thomas Gleixner
  0 siblings, 1 reply; 6+ messages in thread
From: Jeffy Chen @ 2017-05-27 10:05 UTC (permalink / raw)
  To: linux-kernel, tglx; +Cc: briannorris, dianders, tfiga, Jeffy Chen

If a irq is already disabled, irq_shutdown may try to disable it again,
for example:
devm_request_irq->irq_startup->irq_enable
disable_irq					<-- disabled
devm_free_irq->irq_shutdown			<-- disable it again

This would confuse some chips which require balanced irq enable and
disable, for example pinctrl-rockchip & pinctrl-nomadik.

Add a state check before calling irq_disable to prevent that.

v2: Rewrite commit message.
v3: Rewrite commit message and not skip irq_shutdown.

Signed-off-by: Jeffy Chen <jeffy.chen@rock-chips.com>
---

Changes in v3:
Rewrite commit message and not skip irq_shutdown.

Changes in v2:
Rewrite commit message.

 kernel/irq/chip.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
index 686be4b..0ac0c56 100644
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -206,13 +206,23 @@ int irq_startup(struct irq_desc *desc, bool resend)
 
 void irq_shutdown(struct irq_desc *desc)
 {
+	int irq_disabled = irqd_irq_disabled(&desc->irq_data);
+
 	irq_state_set_disabled(desc);
 	desc->depth = 1;
 	if (desc->irq_data.chip->irq_shutdown)
 		desc->irq_data.chip->irq_shutdown(&desc->irq_data);
-	else if (desc->irq_data.chip->irq_disable)
+	/*
+	 * 1/ Some chips may require balanced irq enable &_disable.
+	 * 2/ Due to the lazy disable approach, a irq could be
+	 *    disabled but unmasked.
+	 *
+	 * So if this irq is already disabled, let's mask it instead
+	 * of trying to call irq_disable again.
+	 */
+	else if (desc->irq_data.chip->irq_disable && !irq_disabled)
 		desc->irq_data.chip->irq_disable(&desc->irq_data);
-	else
+	else if (desc->irq_data.chip->irq_mask)
 		desc->irq_data.chip->irq_mask(&desc->irq_data);
 	irq_domain_deactivate_irq(&desc->irq_data);
 	irq_state_set_masked(desc);
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v3] genirq: Check irq disabled & masked states in irq_shutdown
  2017-05-27 10:05 [PATCH v3] genirq: Check irq disabled & masked states in irq_shutdown Jeffy Chen
@ 2017-05-27 11:12 ` Thomas Gleixner
  2017-05-27 15:11   ` Tomasz Figa
  0 siblings, 1 reply; 6+ messages in thread
From: Thomas Gleixner @ 2017-05-27 11:12 UTC (permalink / raw)
  To: Jeffy Chen; +Cc: linux-kernel, briannorris, dianders, tfiga

On Sat, 27 May 2017, Jeffy Chen wrote:

> If a irq is already disabled, irq_shutdown may try to disable it again,
> for example:
> devm_request_irq->irq_startup->irq_enable
> disable_irq					<-- disabled
> devm_free_irq->irq_shutdown			<-- disable it again
> 
> This would confuse some chips which require balanced irq enable and
> disable, for example pinctrl-rockchip & pinctrl-nomadik.

"would confuse" is an interesting technical term. Can you please start to
explain things in coherent sentences so people who do not have detailed
knowledge of the problem at hand can understand it?

> Add a state check before calling irq_disable to prevent that.

So you analyzed in half an hour that all of this is correct and fixes all
possible imbalances, right?  Really impressive!

You just forgot to mention this in the changelog.

> v2: Rewrite commit message.
> v3: Rewrite commit message and not skip irq_shutdown.

This does not belong into the changelog and having that information twice
is more than pointless.

Before you send yet another version of this within 5 minutes, can you
please sit down and analyze all the possible imbalance scenarios?

I'm not going to apply hastiliy cobbled together workarounds which "fix"
just half of the problem space. This is not random driver code which breaks
ONE type of machine. This is core code which has the potential to break ALL
machines out there in one go.

Either you come up with a properly analyzed solution which addresses all
possible imbalanced invocations or you have to wait until I find some time
to look at it myself.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v3] genirq: Check irq disabled & masked states in irq_shutdown
  2017-05-27 11:12 ` Thomas Gleixner
@ 2017-05-27 15:11   ` Tomasz Figa
  2017-05-29  6:56     ` Thomas Gleixner
  0 siblings, 1 reply; 6+ messages in thread
From: Tomasz Figa @ 2017-05-27 15:11 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Jeffy Chen, linux-kernel, Brian Norris, Douglas Anderson

Hi Thomas,

Thank you for your comments. Please see my replies inline.

On Sat, May 27, 2017 at 8:12 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Sat, 27 May 2017, Jeffy Chen wrote:
>
>> If a irq is already disabled, irq_shutdown may try to disable it again,
>> for example:
>> devm_request_irq->irq_startup->irq_enable
>> disable_irq                                   <-- disabled
>> devm_free_irq->irq_shutdown                   <-- disable it again
>>
>> This would confuse some chips which require balanced irq enable and
>> disable, for example pinctrl-rockchip & pinctrl-nomadik.
>
> "would confuse" is an interesting technical term. Can you please start to
> explain things in coherent sentences so people who do not have detailed
> knowledge of the problem at hand can understand it?

I think we might simply have a language barrier here unfortunately. I
agree, though, that we need a better description of the problem. Next
time we will help Jeffy with polishing the commit message. Please
forgive him this time, as he is still learning "the art" of mainline
patch submission.

>
>> Add a state check before calling irq_disable to prevent that.
>
> So you analyzed in half an hour that all of this is correct and fixes all
> possible imbalances, right?  Really impressive!
>
> You just forgot to mention this in the changelog.

I think this patch should have been an RFC to begin with, as it's just
supposed to show the problem we found and start a discussion on a way
to fix it.

>
>> v2: Rewrite commit message.
>> v3: Rewrite commit message and not skip irq_shutdown.
>
> This does not belong into the changelog and having that information twice
> is more than pointless.
>
> Before you send yet another version of this within 5 minutes, can you
> please sit down and analyze all the possible imbalance scenarios?

I think we should start from actually figuring out where the problem is.

The IRQ functionality provided by the pinctrl-rockchip has a power
saving mechanism that attempts to gate clocks whenever there are no
enabled interrupts. Currently the driver calls clk_enable() in
.irq_enable() and clk_disable() in .irq_disable() callbacks of its IRQ
chip. However there is no mention about ordering or reference counting
of those in code documentation (which is likely to mean that there are
no ordering guarantees and/or unbalanced calls may happen, please
correct me if I'm wrong).

We noticed that following scenario causes unbalanced clk_disable() calls:
 1) request_irq(),
 2) disable_irq(),
 3) free_irq().

After checking what's going on, we found that free_irq() ends up
calling irq_shutdown(), which defaults to chip's .irq_disable() if it
doesn't have .irq_shutdown() specified. This means that regardless of
whether disable_irq() was or wasn't called before, there is one more
call to .irq_disable() after free_irq(), which is the reason for our
unbalanced clk_disable() call from pinctrl-rockchip IRQ chip.

Now, we can simply hack the driver to not rely on any ordering of
.enable/disable_irq() calls, but we thought it would make more sense
to actually try to figure out whether it's the expected behavior from
the IRQ chip core code.

>
> I'm not going to apply hastiliy cobbled together workarounds which "fix"
> just half of the problem space. This is not random driver code which breaks
> ONE type of machine. This is core code which has the potential to break ALL
> machines out there in one go.
>
> Either you come up with a properly analyzed solution which addresses all
> possible imbalanced invocations or you have to wait until I find some time
> to look at it myself.

As I explained above, we might have introduced unnecessary confusion
here. Please forgive us. On the other hand, I'd like to ask for a bit
more understanding, as we have people involved in this, who are still
learning the tough art of upstream submission, potentially also behind
a language barrier and I'm pretty much sure they are more than happy
to address all your concerns, but would be much more motivated to do
the right thing if guided in a bit more humane manner. Thanks in
advance.

Best regards,
Tomasz

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v3] genirq: Check irq disabled & masked states in irq_shutdown
  2017-05-27 15:11   ` Tomasz Figa
@ 2017-05-29  6:56     ` Thomas Gleixner
       [not found]       ` <tivl20goiv96pkmvncdkqvud.1496054861146@email.android.com>
  0 siblings, 1 reply; 6+ messages in thread
From: Thomas Gleixner @ 2017-05-29  6:56 UTC (permalink / raw)
  To: Tomasz Figa; +Cc: Jeffy Chen, linux-kernel, Brian Norris, Douglas Anderson

Tomasz,

On Sun, 28 May 2017, Tomasz Figa wrote:
> On Sat, May 27, 2017 at 8:12 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> I think we might simply have a language barrier here unfortunately. I
> agree, though, that we need a better description of the problem. Next
> time we will help Jeffy with polishing the commit message. Please
> forgive him this time, as he is still learning "the art" of mainline
> patch submission.

No problem.

> The IRQ functionality provided by the pinctrl-rockchip has a power
> saving mechanism that attempts to gate clocks whenever there are no
> enabled interrupts. Currently the driver calls clk_enable() in
> .irq_enable() and clk_disable() in .irq_disable() callbacks of its IRQ
> chip. However there is no mention about ordering or reference counting
> of those in code documentation (which is likely to mean that there are
> no ordering guarantees and/or unbalanced calls may happen, please
> correct me if I'm wrong).

Correct. We try to avoid unbalanced calls, but it's not complete.

> We noticed that following scenario causes unbalanced clk_disable() calls:
>  1) request_irq(),
>  2) disable_irq(),
>  3) free_irq().
> 
> After checking what's going on, we found that free_irq() ends up
> calling irq_shutdown(), which defaults to chip's .irq_disable() if it
> doesn't have .irq_shutdown() specified. This means that regardless of
> whether disable_irq() was or wasn't called before, there is one more
> call to .irq_disable() after free_irq(), which is the reason for our
> unbalanced clk_disable() call from pinctrl-rockchip IRQ chip.
> 
> Now, we can simply hack the driver to not rely on any ordering of
> .enable/disable_irq() calls, but we thought it would make more sense
> to actually try to figure out whether it's the expected behavior from
> the IRQ chip core code.

Yes. But there are several ways this can happen not only via the above
scenario.

> > Either you come up with a properly analyzed solution which addresses all
> > possible imbalanced invocations or you have to wait until I find some time
> > to look at it myself.
> 
> As I explained above, we might have introduced unnecessary confusion
> here. Please forgive us. On the other hand, I'd like to ask for a bit
> more understanding, as we have people involved in this, who are still
> learning the tough art of upstream submission, potentially also behind
> a language barrier and I'm pretty much sure they are more than happy
> to address all your concerns, but would be much more motivated to do
> the right thing if guided in a bit more humane manner. Thanks in
> advance.

Sorry, if I replied more harsh than necessary. I have neither a problem
with language barriers nor with patches which are not perfect.

The reason why I got a bit grumpy was the fact that I pointed out on my
reply to V2:

 ... irq_shutdown() is only one place where this can happen. This needs
 more thought ...

as a reaction I get yet another variant of the same patch fiddling in
exactly one function, i.e. irq_shutdown.

I didn't want to offend Jerry, but may I please ask that my review comments
are taken seriously and properly addressed. If there are questions then
better ask than ignore.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v3] genirq: Check irq disabled & masked states in irq_shutdown
       [not found]       ` <tivl20goiv96pkmvncdkqvud.1496054861146@email.android.com>
@ 2017-05-30 23:02         ` Thomas Gleixner
  2017-06-26  6:22           ` jeffy
  0 siblings, 1 reply; 6+ messages in thread
From: Thomas Gleixner @ 2017-05-30 23:02 UTC (permalink / raw)
  To: jeffy.chen
  Cc: Tomasz Figa, LKML, Brian Norris, Douglas Anderson, Marc Zyngier

On Mon, 29 May 2017, jeffy.chen wrote:
> i think if we want to make all irq enable/disable balance, maybe we can:
> 
> 1/ only call irq_enable/disable from enable/disable_irq(change other
> irq_enable/disable to enable/disable_irq), so they would be protected by the
> refcnt(deph)

You cannot call enable/disable_irq() from places which call
irq_enable/disable() due to locking reasons.

__disable_irq()/__enable_irq() can/must be called with desc->lock held, but
__enable_irq() does more than just calling irq_enable().

So no, it's not that simple.

> 2/ disable lazy stuff in irq_shoutdown(we already did this in free_irq)

No, irq_shutdown() is called from other places as well. 

> 3/ in irq_shutdown, set depth to 0 if it's not disabled and masked(for lazy stuff)
> before calling disable_irq, 

Uurgh, no. That's a unholy hack.

So what should be done to fix this is to make consequent use of the state bits.

     irq_disable()
        if (irqd_irq_disabled())
	   return;
	irqd_set_irq_disabled();
	....

This should be done for both mask/unmask disable/enable. You get the idea.

We probably want a third state bit for STARTED_UP and do the same dance in
startup/shutdown as well. Which brings me to a different issue, which is
outside the scope of your problem, but looking at the code made me find it.

If a interrupt is marked IRQ_NOAUTOEN then request_irq() will not invoke
irq_startup(). The interrupt is just completely set up, but stays disabled.
It is enabled later via enable_irq(). That works so far with no complaints,
but there is an interesting twist:

In that NOAUTOEN case nothing ever calls irq_startup() on that irq, which
means that in case the irq_chip has a irq_startup() callback nothing will
invoke it and also irq_domain_activate_irq() will never be invoked on such
an irq.

Looks like all implementations which use IRQ_NOAUTOEN are not sensitive to
that. It's been broken forever.

Fixing this needs the extra state bit IRQD_ STARTED_UP as we cannot reuse
the IRQD_ACTIVATED bit because some of the interrupts are actually
activated before they are requested.

Too tired to fix that now, but I'll have a look tomorrow. Once this is
fixed, you can add the extra bits to prevent this disable/enable calls
which cause the imbalance deeper down.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v3] genirq: Check irq disabled & masked states in irq_shutdown
  2017-05-30 23:02         ` Thomas Gleixner
@ 2017-06-26  6:22           ` jeffy
  0 siblings, 0 replies; 6+ messages in thread
From: jeffy @ 2017-06-26  6:22 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Tomasz Figa, LKML, Brian Norris, Douglas Anderson, Marc Zyngier

Hi Thomas Gleixner,

On 05/31/2017 07:02 AM, Thomas Gleixner wrote:
> On Mon, 29 May 2017, jeffy.chen wrote:
>> i think if we want to make all irq enable/disable balance, maybe we can:
>>
>> 1/ only call irq_enable/disable from enable/disable_irq(change other
>> irq_enable/disable to enable/disable_irq), so they would be protected by the
>> refcnt(deph)
>
> You cannot call enable/disable_irq() from places which call
> irq_enable/disable() due to locking reasons.
>
> __disable_irq()/__enable_irq() can/must be called with desc->lock held, but
> __enable_irq() does more than just calling irq_enable().
>
> So no, it's not that simple.
>
>> 2/ disable lazy stuff in irq_shoutdown(we already did this in free_irq)
>
> No, irq_shutdown() is called from other places as well.
>
>> 3/ in irq_shutdown, set depth to 0 if it's not disabled and masked(for lazy stuff)
>> before calling disable_irq,
>
> Uurgh, no. That's a unholy hack.
>
> So what should be done to fix this is to make consequent use of the state bits.
>
>       irq_disable()
>          if (irqd_irq_disabled())
> 	   return;
> 	irqd_set_irq_disabled();
> 	....
>
> This should be done for both mask/unmask disable/enable. You get the idea.
ok
>
> We probably want a third state bit for STARTED_UP and do the same dance in
> startup/shutdown as well. Which brings me to a different issue, which is
> outside the scope of your problem, but looking at the code made me find it.
>
> If a interrupt is marked IRQ_NOAUTOEN then request_irq() will not invoke
> irq_startup(). The interrupt is just completely set up, but stays disabled.
> It is enabled later via enable_irq(). That works so far with no complaints,
> but there is an interesting twist:
>
> In that NOAUTOEN case nothing ever calls irq_startup() on that irq, which
> means that in case the irq_chip has a irq_startup() callback nothing will
> invoke it and also irq_domain_activate_irq() will never be invoked on such
> an irq.
>
> Looks like all implementations which use IRQ_NOAUTOEN are not sensitive to
> that. It's been broken forever.
>
> Fixing this needs the extra state bit IRQD_ STARTED_UP as we cannot reuse
> the IRQD_ACTIVATED bit because some of the interrupts are actually
> activated before they are requested.
>
> Too tired to fix that now, but I'll have a look tomorrow. Once this is
> fixed, you can add the extra bits to prevent this disable/enable calls
> which cause the imbalance deeper down.
i saw your patches landed, so i sent a patch for 
enable/disable/unmask/mask_irq, please help to review :)
>
> Thanks,
>
> 	tglx
>
>
>
>
>
>
>
>
>
>
>
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-06-26  6:23 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-27 10:05 [PATCH v3] genirq: Check irq disabled & masked states in irq_shutdown Jeffy Chen
2017-05-27 11:12 ` Thomas Gleixner
2017-05-27 15:11   ` Tomasz Figa
2017-05-29  6:56     ` Thomas Gleixner
     [not found]       ` <tivl20goiv96pkmvncdkqvud.1496054861146@email.android.com>
2017-05-30 23:02         ` Thomas Gleixner
2017-06-26  6:22           ` jeffy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).