linux-usb.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC][PATCH] usb: typec: tcpci_rt1711h: Try to avoid screaming irq causing boot hangs
@ 2020-05-30  4:01 John Stultz
  2020-05-30 10:29 ` Jun Li
  0 siblings, 1 reply; 4+ messages in thread
From: John Stultz @ 2020-05-30  4:01 UTC (permalink / raw)
  To: lkml
  Cc: John Stultz, Guenter Roeck, Heikki Krogerus, Greg Kroah-Hartman,
	YongQin Liu, linux-usb

I've recently (since 5.7-rc1) started noticing very rare hangs
pretty early in bootup on my HiKey960 board.

They have been particularly difficult to debug, as the system
seems to not respond at all to sysrq- commands. However, the
system is alive as I'll occaionally see firmware loading timeout
errors after awhile. Adding changes like initcall_debug and
lockdep weren't informative, as it tended to cause the problem
to hide.

I finally tried to dig in a bit more on this today, and noticed
that the last dmesg output before the hang was usually:
  "random: crng init done"

So I dumped the stack at that point, and saw it was being called
from the pl061 gpio irq, and the hang always occurred when the
crng init finished on cpu 0. Instrumenting that more I could see
that when the issue triggered, we were getting a stream of irqs.

Chasing further, I found the screaming irq was for the rt1711h,
and narrowed down that we were hitting the !chip->tcpci check
which immediately returns IRQ_HANDLED, but does not stop the
irq from triggering immediately afterwards.

This patch slightly reworks the logic, so if we hit the irq
before the chip->tcpci has been assigned, we still read and
write the alert register, but just skip calling tcpci_irq().

With this change, I haven't managed to trip over the problem
(though it hasn't been super long - but I did confirm I hit
the error case and it didn't hang the system).

I still have some concern that I don't know why this cropped
up since 5.7-rc, as there haven't been any changes to the
driver since 5.4 (or before). It may just be the initialization
timing has changed due to something else, and its just exposed
this issue? I'm not sure, and that's not super re-assuring.

Anyway, I'd love to hear your thoughts if this looks like a sane
fix or not.

Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: YongQin Liu <yongqin.liu@linaro.org>
Cc: linux-usb@vger.kernel.org
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 drivers/usb/typec/tcpm/tcpci_rt1711h.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/usb/typec/tcpm/tcpci_rt1711h.c b/drivers/usb/typec/tcpm/tcpci_rt1711h.c
index 017389021b96..530fd2c111ad 100644
--- a/drivers/usb/typec/tcpm/tcpci_rt1711h.c
+++ b/drivers/usb/typec/tcpm/tcpci_rt1711h.c
@@ -159,9 +159,6 @@ static irqreturn_t rt1711h_irq(int irq, void *dev_id)
 	u8 status;
 	struct rt1711h_chip *chip = dev_id;
 
-	if (!chip->tcpci)
-		return IRQ_HANDLED;
-
 	ret = rt1711h_read16(chip, TCPC_ALERT, &alert);
 	if (ret < 0)
 		goto out;
@@ -176,6 +173,9 @@ static irqreturn_t rt1711h_irq(int irq, void *dev_id)
 	}
 
 out:
+	if (!chip->tcpci)
+		return IRQ_HANDLED;
+
 	return tcpci_irq(chip->tcpci);
 }
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [RFC][PATCH] usb: typec: tcpci_rt1711h: Try to avoid screaming irq causing boot hangs
  2020-05-30  4:01 [RFC][PATCH] usb: typec: tcpci_rt1711h: Try to avoid screaming irq causing boot hangs John Stultz
@ 2020-05-30 10:29 ` Jun Li
  2020-06-01 20:39   ` John Stultz
  0 siblings, 1 reply; 4+ messages in thread
From: Jun Li @ 2020-05-30 10:29 UTC (permalink / raw)
  To: John Stultz
  Cc: lkml, Guenter Roeck, Heikki Krogerus, Greg Kroah-Hartman,
	YongQin Liu, Linux USB List

Hi John,

John Stultz <john.stultz@linaro.org> 于2020年5月30日周六 下午12:02写道:
>
> I've recently (since 5.7-rc1) started noticing very rare hangs
> pretty early in bootup on my HiKey960 board.
>
> They have been particularly difficult to debug, as the system
> seems to not respond at all to sysrq- commands. However, the
> system is alive as I'll occaionally see firmware loading timeout
> errors after awhile. Adding changes like initcall_debug and
> lockdep weren't informative, as it tended to cause the problem
> to hide.
>
> I finally tried to dig in a bit more on this today, and noticed
> that the last dmesg output before the hang was usually:
>   "random: crng init done"
>
> So I dumped the stack at that point, and saw it was being called
> from the pl061 gpio irq, and the hang always occurred when the
> crng init finished on cpu 0. Instrumenting that more I could see
> that when the issue triggered, we were getting a stream of irqs.
>
> Chasing further, I found the screaming irq was for the rt1711h,
> and narrowed down that we were hitting the !chip->tcpci check
> which immediately returns IRQ_HANDLED, but does not stop the
> irq from triggering immediately afterwards.
>
> This patch slightly reworks the logic, so if we hit the irq
> before the chip->tcpci has been assigned, we still read and
> write the alert register, but just skip calling tcpci_irq().
>
> With this change, I haven't managed to trip over the problem
> (though it hasn't been super long - but I did confirm I hit
> the error case and it didn't hang the system).
>
> I still have some concern that I don't know why this cropped
> up since 5.7-rc, as there haven't been any changes to the
> driver since 5.4 (or before). It may just be the initialization
> timing has changed due to something else, and its just exposed
> this issue? I'm not sure, and that's not super re-assuring.
>
> Anyway, I'd love to hear your thoughts if this looks like a sane
> fix or not.

I think a better solution may be move the irq request after port register,
we should fire the irq after everything is setup.
does below change works for you?

diff --git a/drivers/usb/typec/tcpm/tcpci_rt1711h.c
b/drivers/usb/typec/tcpm/tcpci_rt1711h.c
index 0173890..b56a088 100644
--- a/drivers/usb/typec/tcpm/tcpci_rt1711h.c
+++ b/drivers/usb/typec/tcpm/tcpci_rt1711h.c
@@ -179,26 +179,6 @@ static irqreturn_t rt1711h_irq(int irq, void *dev_id)
        return tcpci_irq(chip->tcpci);
 }

-static int rt1711h_init_alert(struct rt1711h_chip *chip,
-                             struct i2c_client *client)
-{
-       int ret;
-
-       /* Disable chip interrupts before requesting irq */
-       ret = rt1711h_write16(chip, TCPC_ALERT_MASK, 0);
-       if (ret < 0)
-               return ret;
-
-       ret = devm_request_threaded_irq(chip->dev, client->irq, NULL,
-                                       rt1711h_irq,
-                                       IRQF_ONESHOT | IRQF_TRIGGER_LOW,
-                                       dev_name(chip->dev), chip);
-       if (ret < 0)
-               return ret;
-       enable_irq_wake(client->irq);
-       return 0;
-}
-
 static int rt1711h_sw_reset(struct rt1711h_chip *chip)
 {
        int ret;
@@ -260,7 +240,8 @@ static int rt1711h_probe(struct i2c_client *client,
        if (ret < 0)
                return ret;

-       ret = rt1711h_init_alert(chip, client);
+       /* Disable chip interrupts before requesting irq */
+       ret = rt1711h_write16(chip, TCPC_ALERT_MASK, 0);
        if (ret < 0)
                return ret;

@@ -271,6 +252,14 @@ static int rt1711h_probe(struct i2c_client *client,
        if (IS_ERR_OR_NULL(chip->tcpci))
                return PTR_ERR(chip->tcpci);

+       ret = devm_request_threaded_irq(chip->dev, client->irq, NULL,
+                                       rt1711h_irq,
+                                       IRQF_ONESHOT | IRQF_TRIGGER_LOW,
+                                       dev_name(chip->dev), chip);
+       if (ret < 0)
+               return ret;
+       enable_irq_wake(client->irq);
+
        return 0;
 }

Li Jun
>
> Cc: Guenter Roeck <linux@roeck-us.net>
> Cc: Heikki Krogerus <heikki.krogerus@linux.intel.com>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: YongQin Liu <yongqin.liu@linaro.org>
> Cc: linux-usb@vger.kernel.org
> Signed-off-by: John Stultz <john.stultz@linaro.org>
> ---
>  drivers/usb/typec/tcpm/tcpci_rt1711h.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/usb/typec/tcpm/tcpci_rt1711h.c b/drivers/usb/typec/tcpm/tcpci_rt1711h.c
> index 017389021b96..530fd2c111ad 100644
> --- a/drivers/usb/typec/tcpm/tcpci_rt1711h.c
> +++ b/drivers/usb/typec/tcpm/tcpci_rt1711h.c
> @@ -159,9 +159,6 @@ static irqreturn_t rt1711h_irq(int irq, void *dev_id)
>         u8 status;
>         struct rt1711h_chip *chip = dev_id;
>
> -       if (!chip->tcpci)
> -               return IRQ_HANDLED;
> -
>         ret = rt1711h_read16(chip, TCPC_ALERT, &alert);
>         if (ret < 0)
>                 goto out;
> @@ -176,6 +173,9 @@ static irqreturn_t rt1711h_irq(int irq, void *dev_id)
>         }
>
>  out:
> +       if (!chip->tcpci)
> +               return IRQ_HANDLED;
> +
>         return tcpci_irq(chip->tcpci);
>  }
>
> --
> 2.17.1
>

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [RFC][PATCH] usb: typec: tcpci_rt1711h: Try to avoid screaming irq causing boot hangs
  2020-05-30 10:29 ` Jun Li
@ 2020-06-01 20:39   ` John Stultz
  2020-06-02  6:45     ` Jun Li
  0 siblings, 1 reply; 4+ messages in thread
From: John Stultz @ 2020-06-01 20:39 UTC (permalink / raw)
  To: Jun Li
  Cc: lkml, Guenter Roeck, Heikki Krogerus, Greg Kroah-Hartman,
	YongQin Liu, Linux USB List

On Sat, May 30, 2020 at 3:30 AM Jun Li <lijun.kernel@gmail.com> wrote:
>
> Hi John,
>
> John Stultz <john.stultz@linaro.org> 于2020年5月30日周六 下午12:02写道:
> >
> > I've recently (since 5.7-rc1) started noticing very rare hangs
> > pretty early in bootup on my HiKey960 board.
> >
> > They have been particularly difficult to debug, as the system
> > seems to not respond at all to sysrq- commands. However, the
> > system is alive as I'll occaionally see firmware loading timeout
> > errors after awhile. Adding changes like initcall_debug and
> > lockdep weren't informative, as it tended to cause the problem
> > to hide.
> >
> > I finally tried to dig in a bit more on this today, and noticed
> > that the last dmesg output before the hang was usually:
> >   "random: crng init done"
> >
> > So I dumped the stack at that point, and saw it was being called
> > from the pl061 gpio irq, and the hang always occurred when the
> > crng init finished on cpu 0. Instrumenting that more I could see
> > that when the issue triggered, we were getting a stream of irqs.
> >
> > Chasing further, I found the screaming irq was for the rt1711h,
> > and narrowed down that we were hitting the !chip->tcpci check
> > which immediately returns IRQ_HANDLED, but does not stop the
> > irq from triggering immediately afterwards.
> >
> > This patch slightly reworks the logic, so if we hit the irq
> > before the chip->tcpci has been assigned, we still read and
> > write the alert register, but just skip calling tcpci_irq().
> >
> > With this change, I haven't managed to trip over the problem
> > (though it hasn't been super long - but I did confirm I hit
> > the error case and it didn't hang the system).
> >
> > I still have some concern that I don't know why this cropped
> > up since 5.7-rc, as there haven't been any changes to the
> > driver since 5.4 (or before). It may just be the initialization
> > timing has changed due to something else, and its just exposed
> > this issue? I'm not sure, and that's not super re-assuring.
> >
> > Anyway, I'd love to hear your thoughts if this looks like a sane
> > fix or not.
>
> I think a better solution may be move the irq request after port register,
> we should fire the irq after everything is setup.
> does below change works for you?

Unfortunately the patch didn't seem to apply, but I recreated it by
hand. I agree this looks like it should address the issue and I've not
managed to trigger the problem in my (admittedly somewhat brief)
attempts at testing.

Thanks for sending it out. Do you want to submit the patch and I'll
provide a Tested-by tag, or would it help for me to submit your
suggested change?

thanks
-john

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC][PATCH] usb: typec: tcpci_rt1711h: Try to avoid screaming irq causing boot hangs
  2020-06-01 20:39   ` John Stultz
@ 2020-06-02  6:45     ` Jun Li
  0 siblings, 0 replies; 4+ messages in thread
From: Jun Li @ 2020-06-02  6:45 UTC (permalink / raw)
  To: John Stultz
  Cc: lkml, Guenter Roeck, Heikki Krogerus, Greg Kroah-Hartman,
	YongQin Liu, Linux USB List

Hi,
John Stultz <john.stultz@linaro.org> 于2020年6月2日周二 上午4:39写道:
>
> On Sat, May 30, 2020 at 3:30 AM Jun Li <lijun.kernel@gmail.com> wrote:
> >
> > Hi John,
> >
> > John Stultz <john.stultz@linaro.org> 于2020年5月30日周六 下午12:02写道:
> > >
> > > I've recently (since 5.7-rc1) started noticing very rare hangs
> > > pretty early in bootup on my HiKey960 board.
> > >
> > > They have been particularly difficult to debug, as the system
> > > seems to not respond at all to sysrq- commands. However, the
> > > system is alive as I'll occaionally see firmware loading timeout
> > > errors after awhile. Adding changes like initcall_debug and
> > > lockdep weren't informative, as it tended to cause the problem
> > > to hide.
> > >
> > > I finally tried to dig in a bit more on this today, and noticed
> > > that the last dmesg output before the hang was usually:
> > >   "random: crng init done"
> > >
> > > So I dumped the stack at that point, and saw it was being called
> > > from the pl061 gpio irq, and the hang always occurred when the
> > > crng init finished on cpu 0. Instrumenting that more I could see
> > > that when the issue triggered, we were getting a stream of irqs.
> > >
> > > Chasing further, I found the screaming irq was for the rt1711h,
> > > and narrowed down that we were hitting the !chip->tcpci check
> > > which immediately returns IRQ_HANDLED, but does not stop the
> > > irq from triggering immediately afterwards.
> > >
> > > This patch slightly reworks the logic, so if we hit the irq
> > > before the chip->tcpci has been assigned, we still read and
> > > write the alert register, but just skip calling tcpci_irq().
> > >
> > > With this change, I haven't managed to trip over the problem
> > > (though it hasn't been super long - but I did confirm I hit
> > > the error case and it didn't hang the system).
> > >
> > > I still have some concern that I don't know why this cropped
> > > up since 5.7-rc, as there haven't been any changes to the
> > > driver since 5.4 (or before). It may just be the initialization
> > > timing has changed due to something else, and its just exposed
> > > this issue? I'm not sure, and that's not super re-assuring.
> > >
> > > Anyway, I'd love to hear your thoughts if this looks like a sane
> > > fix or not.
> >
> > I think a better solution may be move the irq request after port register,
> > we should fire the irq after everything is setup.
> > does below change works for you?
>
> Unfortunately the patch didn't seem to apply, but I recreated it by
> hand. I agree this looks like it should address the issue and I've not
> managed to trigger the problem in my (admittedly somewhat brief)
> attempts at testing.
>
> Thanks for sending it out. Do you want to submit the patch and I'll
> provide a Tested-by tag, or would it help for me to submit your
> suggested change?

OK, I will send out a patch against Greg's tree.

Li Jun
>
> thanks
> -john

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-06-02  6:45 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-30  4:01 [RFC][PATCH] usb: typec: tcpci_rt1711h: Try to avoid screaming irq causing boot hangs John Stultz
2020-05-30 10:29 ` Jun Li
2020-06-01 20:39   ` John Stultz
2020-06-02  6:45     ` Jun Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).