linux-integrity.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] tpm_tis_core: Disable broken IRQ handling code
@ 2020-04-09 21:10 Hans de Goede
  2020-04-10 16:38 ` Jason Gunthorpe
  2020-04-10 21:06 ` Jarkko Sakkinen
  0 siblings, 2 replies; 22+ messages in thread
From: Hans de Goede @ 2020-04-09 21:10 UTC (permalink / raw)
  To: Peter Huewe, Jarkko Sakkinen, Jason Gunthorpe, Jerry Snitselaar,
	Stefan Berger
  Cc: Hans de Goede, Arnd Bergmann, Greg Kroah-Hartman, linux-integrity

Since commit dda8b2af395b ("tpm: Revert "tpm_tis_core: Set
TPM_CHIP_FLAG_IRQ before probing for interrupts"") we no longer set
the TPM_CHIP_FLAG_IRQ ever.

So the whole IRQ probing code is not useful, worse we rely on the
IRQ-test path of tpm_tis_send() to call disable_interrupts() if
interrupts do not work, but that path never gets entered because we
never set the TPM_CHIP_FLAG_IRQ.

So the remaining IRQ probe code calls request_irq() and never calls
free_irq() even when the interrupt is not working.

On some systems, e.g. the Lenovo X1 8th gen,  the interrupt we try
to use and never free creates an interrupt storm followed by
an "irq XX: nobody cared" oops.

Since it is non-functional at the moment anyways, lets just completely
disable the IRQ code in tpm_tis_core for now.

Fixes: dda8b2af395b ("tpm: Revert "tpm_tis_core: Set TPM_CHIP_FLAG_IRQ before probing for interrupts"")
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
---
Note I'm working with Lenovo to try and get to the bottom of this.
---
 drivers/char/tpm/tpm_tis_core.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
index 27c6ca031e23..647a4a4ccd0c 100644
--- a/drivers/char/tpm/tpm_tis_core.c
+++ b/drivers/char/tpm/tpm_tis_core.c
@@ -697,6 +697,7 @@ static bool tpm_tis_req_canceled(struct tpm_chip *chip, u8 status)
 	}
 }
 
+#if 0 /* See the comment in tpm_tis_core_init */
 static irqreturn_t tis_int_handler(int dummy, void *dev_id)
 {
 	struct tpm_chip *chip = dev_id;
@@ -838,6 +839,7 @@ static void tpm_tis_probe_irq(struct tpm_chip *chip, u32 intmask)
 					     original_int_vec))
 		return;
 }
+#endif
 
 void tpm_tis_remove(struct tpm_chip *chip)
 {
@@ -1048,6 +1050,14 @@ int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq,
 	/* INTERRUPT Setup */
 	init_waitqueue_head(&priv->read_queue);
 	init_waitqueue_head(&priv->int_queue);
+/*
+ * Interrupt support is broken ATM, we never set TPM_CHIP_FLAG_IRQ.
+ * The below code still registers an interrupt handler even though we never
+ * wait for the wait_queues it signals. On some systems the interrupt we try
+ * to use creates an interrupt storm followed by an "irq XX: nobody cared"
+ * oops. So disable this code for now.
+ */
+#if 0
 	if (irq != -1) {
 		/* Before doing irq testing issue a command to the TPM in polling mode
 		 * to make sure it works. May as well use that command to set the
@@ -1069,6 +1079,7 @@ int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq,
 			tpm_tis_probe_irq(chip, intmask);
 		}
 	}
+#endif
 
 	rc = tpm_chip_register(chip);
 	if (rc)
-- 
2.26.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH] tpm_tis_core: Disable broken IRQ handling code
  2020-04-09 21:10 [PATCH] tpm_tis_core: Disable broken IRQ handling code Hans de Goede
@ 2020-04-10 16:38 ` Jason Gunthorpe
  2020-04-10 21:08   ` Jarkko Sakkinen
                     ` (2 more replies)
  2020-04-10 21:06 ` Jarkko Sakkinen
  1 sibling, 3 replies; 22+ messages in thread
From: Jason Gunthorpe @ 2020-04-10 16:38 UTC (permalink / raw)
  To: Hans de Goede
  Cc: Peter Huewe, Jarkko Sakkinen, Jerry Snitselaar, Stefan Berger,
	Arnd Bergmann, Greg Kroah-Hartman, linux-integrity

On Thu, Apr 09, 2020 at 11:10:44PM +0200, Hans de Goede wrote:
> Since commit dda8b2af395b ("tpm: Revert "tpm_tis_core: Set
> TPM_CHIP_FLAG_IRQ before probing for interrupts"") we no longer set
> the TPM_CHIP_FLAG_IRQ ever.

This all used to work..

> So the whole IRQ probing code is not useful, worse we rely on the
> IRQ-test path of tpm_tis_send() to call disable_interrupts() if
> interrupts do not work, but that path never gets entered because we
> never set the TPM_CHIP_FLAG_IRQ.

The IRQ probing code should be deleted.
 
> On some systems, e.g. the Lenovo X1 8th gen,  the interrupt we try
> to use and never free creates an interrupt storm followed by
> an "irq XX: nobody cared" oops.

Is this related to probing?

Jason

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] tpm_tis_core: Disable broken IRQ handling code
  2020-04-09 21:10 [PATCH] tpm_tis_core: Disable broken IRQ handling code Hans de Goede
  2020-04-10 16:38 ` Jason Gunthorpe
@ 2020-04-10 21:06 ` Jarkko Sakkinen
  2020-04-10 21:09   ` Jarkko Sakkinen
                     ` (2 more replies)
  1 sibling, 3 replies; 22+ messages in thread
From: Jarkko Sakkinen @ 2020-04-10 21:06 UTC (permalink / raw)
  To: Hans de Goede
  Cc: Peter Huewe, Jason Gunthorpe, Jerry Snitselaar, Stefan Berger,
	Arnd Bergmann, Greg Kroah-Hartman, linux-integrity

On Thu, Apr 09, 2020 at 11:10:44PM +0200, Hans de Goede wrote:
> Since commit dda8b2af395b ("tpm: Revert "tpm_tis_core: Set
> TPM_CHIP_FLAG_IRQ before probing for interrupts"") we no longer set
> the TPM_CHIP_FLAG_IRQ ever.
> 
> So the whole IRQ probing code is not useful, worse we rely on the
> IRQ-test path of tpm_tis_send() to call disable_interrupts() if
> interrupts do not work, but that path never gets entered because we
> never set the TPM_CHIP_FLAG_IRQ.
> 
> So the remaining IRQ probe code calls request_irq() and never calls
> free_irq() even when the interrupt is not working.
> 
> On some systems, e.g. the Lenovo X1 8th gen,  the interrupt we try
> to use and never free creates an interrupt storm followed by
> an "irq XX: nobody cared" oops.
> 
> Since it is non-functional at the moment anyways, lets just completely
> disable the IRQ code in tpm_tis_core for now.
> 
> Fixes: dda8b2af395b ("tpm: Revert "tpm_tis_core: Set TPM_CHIP_FLAG_IRQ before probing for interrupts"")
> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
> ---
> Note I'm working with Lenovo to try and get to the bottom of this.
> ---

OK if I recall correctly the reason for reverting was that the fixes
Stefan was sending were broken and no access to hardware were the
issues would be visible. The reason for not doing anything til this
day is that we don't have T490 available.

The lack of devm_free_irq() is an by itself, so please instead send
a fix that adds that to the code instead of "#if 0" crap.

Thank you.

/Jarkko

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] tpm_tis_core: Disable broken IRQ handling code
  2020-04-10 16:38 ` Jason Gunthorpe
@ 2020-04-10 21:08   ` Jarkko Sakkinen
  2020-04-10 21:12   ` Jarkko Sakkinen
  2020-04-14 17:44   ` Jerry Snitselaar
  2 siblings, 0 replies; 22+ messages in thread
From: Jarkko Sakkinen @ 2020-04-10 21:08 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Hans de Goede, Peter Huewe, Jerry Snitselaar, Stefan Berger,
	Arnd Bergmann, Greg Kroah-Hartman, linux-integrity

On Fri, Apr 10, 2020 at 01:38:26PM -0300, Jason Gunthorpe wrote:
> On Thu, Apr 09, 2020 at 11:10:44PM +0200, Hans de Goede wrote:
> > Since commit dda8b2af395b ("tpm: Revert "tpm_tis_core: Set
> > TPM_CHIP_FLAG_IRQ before probing for interrupts"") we no longer set
> > the TPM_CHIP_FLAG_IRQ ever.
> 
> This all used to work..

Yes, up until T490 issues.

/Jarkko

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] tpm_tis_core: Disable broken IRQ handling code
  2020-04-10 21:06 ` Jarkko Sakkinen
@ 2020-04-10 21:09   ` Jarkko Sakkinen
  2020-04-12 14:55     ` Jarkko Sakkinen
  2020-05-07 14:02   ` Hans de Goede
  2020-05-07 14:02   ` Hans de Goede
  2 siblings, 1 reply; 22+ messages in thread
From: Jarkko Sakkinen @ 2020-04-10 21:09 UTC (permalink / raw)
  To: Hans de Goede
  Cc: Peter Huewe, Jason Gunthorpe, Jerry Snitselaar, Stefan Berger,
	Arnd Bergmann, Greg Kroah-Hartman, linux-integrity

On Sat, Apr 11, 2020 at 12:07:02AM +0300, Jarkko Sakkinen wrote:
> On Thu, Apr 09, 2020 at 11:10:44PM +0200, Hans de Goede wrote:
> > Since commit dda8b2af395b ("tpm: Revert "tpm_tis_core: Set
> > TPM_CHIP_FLAG_IRQ before probing for interrupts"") we no longer set
> > the TPM_CHIP_FLAG_IRQ ever.
> > 
> > So the whole IRQ probing code is not useful, worse we rely on the
> > IRQ-test path of tpm_tis_send() to call disable_interrupts() if
> > interrupts do not work, but that path never gets entered because we
> > never set the TPM_CHIP_FLAG_IRQ.
> > 
> > So the remaining IRQ probe code calls request_irq() and never calls
> > free_irq() even when the interrupt is not working.
> > 
> > On some systems, e.g. the Lenovo X1 8th gen,  the interrupt we try
> > to use and never free creates an interrupt storm followed by
> > an "irq XX: nobody cared" oops.
> > 
> > Since it is non-functional at the moment anyways, lets just completely
> > disable the IRQ code in tpm_tis_core for now.
> > 
> > Fixes: dda8b2af395b ("tpm: Revert "tpm_tis_core: Set TPM_CHIP_FLAG_IRQ before probing for interrupts"")
> > Signed-off-by: Hans de Goede <hdegoede@redhat.com>
> > ---
> > Note I'm working with Lenovo to try and get to the bottom of this.
> > ---
> 
> OK if I recall correctly the reason for reverting was that the fixes
> Stefan was sending were broken and no access to hardware were the
> issues would be visible. The reason for not doing anything til this
> day is that we don't have T490 available.
> 
> The lack of devm_free_irq() is an by itself, so please instead send
> a fix that adds that to the code instead of "#if 0" crap.

If we were to go with your proposal, then you'd have to do "if 0" to
bunch of other places. Otherwise, the change would be incomplete. Thus,
it is now more sane just to free that IRQ.

/Jarkko


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] tpm_tis_core: Disable broken IRQ handling code
  2020-04-10 16:38 ` Jason Gunthorpe
  2020-04-10 21:08   ` Jarkko Sakkinen
@ 2020-04-10 21:12   ` Jarkko Sakkinen
  2020-04-14 17:44   ` Jerry Snitselaar
  2 siblings, 0 replies; 22+ messages in thread
From: Jarkko Sakkinen @ 2020-04-10 21:12 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Hans de Goede, Peter Huewe, Jerry Snitselaar, Stefan Berger,
	Arnd Bergmann, Greg Kroah-Hartman, linux-integrity

On Fri, Apr 10, 2020 at 01:38:26PM -0300, Jason Gunthorpe wrote:
> On Thu, Apr 09, 2020 at 11:10:44PM +0200, Hans de Goede wrote:
> > Since commit dda8b2af395b ("tpm: Revert "tpm_tis_core: Set
> > TPM_CHIP_FLAG_IRQ before probing for interrupts"") we no longer set
> > the TPM_CHIP_FLAG_IRQ ever.
> 
> This all used to work..
> 
> > So the whole IRQ probing code is not useful, worse we rely on the
> > IRQ-test path of tpm_tis_send() to call disable_interrupts() if
> > interrupts do not work, but that path never gets entered because we
> > never set the TPM_CHIP_FLAG_IRQ.
> 
> The IRQ probing code should be deleted.
>  
> > On some systems, e.g. the Lenovo X1 8th gen,  the interrupt we try
> > to use and never free creates an interrupt storm followed by
> > an "irq XX: nobody cared" oops.
> 
> Is this related to probing?

Found it:

https://bugzilla.kernel.org/show_bug.cgi?id=203561

Has existed for a quite a while.

/Jarkko

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] tpm_tis_core: Disable broken IRQ handling code
  2020-04-10 21:09   ` Jarkko Sakkinen
@ 2020-04-12 14:55     ` Jarkko Sakkinen
  0 siblings, 0 replies; 22+ messages in thread
From: Jarkko Sakkinen @ 2020-04-12 14:55 UTC (permalink / raw)
  To: Hans de Goede
  Cc: Peter Huewe, Jason Gunthorpe, Jerry Snitselaar, Stefan Berger,
	Arnd Bergmann, Greg Kroah-Hartman, linux-integrity

On Sat, Apr 11, 2020 at 12:09:36AM +0300, Jarkko Sakkinen wrote:
> On Sat, Apr 11, 2020 at 12:07:02AM +0300, Jarkko Sakkinen wrote:
> > On Thu, Apr 09, 2020 at 11:10:44PM +0200, Hans de Goede wrote:
> > > Since commit dda8b2af395b ("tpm: Revert "tpm_tis_core: Set
> > > TPM_CHIP_FLAG_IRQ before probing for interrupts"") we no longer set
> > > the TPM_CHIP_FLAG_IRQ ever.
> > > 
> > > So the whole IRQ probing code is not useful, worse we rely on the
> > > IRQ-test path of tpm_tis_send() to call disable_interrupts() if
> > > interrupts do not work, but that path never gets entered because we
> > > never set the TPM_CHIP_FLAG_IRQ.
> > > 
> > > So the remaining IRQ probe code calls request_irq() and never calls
> > > free_irq() even when the interrupt is not working.
> > > 
> > > On some systems, e.g. the Lenovo X1 8th gen,  the interrupt we try
> > > to use and never free creates an interrupt storm followed by
> > > an "irq XX: nobody cared" oops.
> > > 
> > > Since it is non-functional at the moment anyways, lets just completely
> > > disable the IRQ code in tpm_tis_core for now.
> > > 
> > > Fixes: dda8b2af395b ("tpm: Revert "tpm_tis_core: Set TPM_CHIP_FLAG_IRQ before probing for interrupts"")
> > > Signed-off-by: Hans de Goede <hdegoede@redhat.com>
> > > ---
> > > Note I'm working with Lenovo to try and get to the bottom of this.
> > > ---
> > 
> > OK if I recall correctly the reason for reverting was that the fixes
> > Stefan was sending were broken and no access to hardware were the
> > issues would be visible. The reason for not doing anything til this
> > day is that we don't have T490 available.
> > 
> > The lack of devm_free_irq() is an by itself, so please instead send
> > a fix that adds that to the code instead of "#if 0" crap.
> 
> If we were to go with your proposal, then you'd have to do "if 0" to
> bunch of other places. Otherwise, the change would be incomplete. Thus,
> it is now more sane just to free that IRQ.

Also, already the commit that has the fixes tag disables IRQ handling
and that was for the reason that some hardware that we (basically me
and Jerry) did not have access to. That's why there is no proper fix.

/Jarkko

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] tpm_tis_core: Disable broken IRQ handling code
  2020-04-10 16:38 ` Jason Gunthorpe
  2020-04-10 21:08   ` Jarkko Sakkinen
  2020-04-10 21:12   ` Jarkko Sakkinen
@ 2020-04-14 17:44   ` Jerry Snitselaar
  2020-04-15  8:57     ` Jan Lübbe
  2020-04-16 17:03     ` Jarkko Sakkinen
  2 siblings, 2 replies; 22+ messages in thread
From: Jerry Snitselaar @ 2020-04-14 17:44 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Hans de Goede, Peter Huewe, Jarkko Sakkinen, Stefan Berger,
	Arnd Bergmann, Greg Kroah-Hartman, linux-integrity

On Fri Apr 10 20, Jason Gunthorpe wrote:
>On Thu, Apr 09, 2020 at 11:10:44PM +0200, Hans de Goede wrote:
>> Since commit dda8b2af395b ("tpm: Revert "tpm_tis_core: Set
>> TPM_CHIP_FLAG_IRQ before probing for interrupts"") we no longer set
>> the TPM_CHIP_FLAG_IRQ ever.
>
>This all used to work..

IIRC this goes all the way back to 570a36097f30 ("tpm: drop 'irq' from struct tpm_vendor_specific")
when the flag was added. There was never anything initially setting it in the tpm_tis code.
There were checks added, but the only place it got set was where it did the interrupt test in
tpm_tis_send, and it can only get down to that code if the flag is set. Stefan's patch was enabling
the flag, but with the flag enabled some systems were seeing interrupt storms. I believe it has
been seen with both the t490 and an internal system that Dan Williams was working on at Intel.
Without access to hw seeing the problem the decision was to revert Stefan's patches while
we try to figure out the issues.

>
>> So the whole IRQ probing code is not useful, worse we rely on the
>> IRQ-test path of tpm_tis_send() to call disable_interrupts() if
>> interrupts do not work, but that path never gets entered because we
>> never set the TPM_CHIP_FLAG_IRQ.
>
>The IRQ probing code should be deleted.
>
>> On some systems, e.g. the Lenovo X1 8th gen,  the interrupt we try
>> to use and never free creates an interrupt storm followed by
>> an "irq XX: nobody cared" oops.
>
>Is this related to probing?
>
>Jason
>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] tpm_tis_core: Disable broken IRQ handling code
  2020-04-14 17:44   ` Jerry Snitselaar
@ 2020-04-15  8:57     ` Jan Lübbe
  2020-04-16 17:03     ` Jarkko Sakkinen
  1 sibling, 0 replies; 22+ messages in thread
From: Jan Lübbe @ 2020-04-15  8:57 UTC (permalink / raw)
  To: Jerry Snitselaar, Jason Gunthorpe
  Cc: Hans de Goede, Peter Huewe, Jarkko Sakkinen, Stefan Berger,
	Arnd Bergmann, Greg Kroah-Hartman, linux-integrity

On Tue, 2020-04-14 at 10:44 -0700, Jerry Snitselaar wrote:
> On Fri Apr 10 20, Jason Gunthorpe wrote:
> 
> > On Thu, Apr 09, 2020 at 11:10:44PM +0200, Hans de Goede wrote:
> > > Since commit dda8b2af395b ("tpm: Revert "tpm_tis_core: Set
> > > TPM_CHIP_FLAG_IRQ before probing for interrupts"") we no longer set
> > > the TPM_CHIP_FLAG_IRQ ever.
> > This all used to work..
> 
> 
> IIRC this goes all the way back to 570a36097f30 ("tpm: drop 'irq' from struct tpm_vendor_specific")
> when the flag was added. There was never anything initially setting it in the tpm_tis code.
> There were checks added, but the only place it got set was where it did the interrupt test in
> tpm_tis_send, and it can only get down to that code if the flag is set. Stefan's patch was enabling
> the flag, but with the flag enabled some systems were seeing interrupt storms. I believe it has
> been seen with both the t490 and an internal system that Dan Williams was working on at Intel.
> Without access to hw seeing the problem the decision was to revert Stefan's patches while
> we try to figure out the issues.

I stumbled across this when debugging a sporadic "tpm tpm0:
tpm_try_transmit: send(): error -62 (-ETIME)" we see on a i.MX6 board
with a TPM2 connected via SPI. It seems that comes from down in
wait_for_tpm_stat() when the TPM doesn't become ready quickly enough.

As this board actually has the IRQ connected, but is falling back to
polling, I took another look and what's wrong with the IRQ support. I
don't have it working yet, but have found some things I'd like to share
(and maybe get some fresh ideas).

I used Stefan Berger's "tpm_tis_core: Turn on the TPM before probing
IRQ's" and "tpm_tis_core: Set TPM_CHIP_FLAG_IRQ before probing for
interrupts" to get it into the IRQ verification code in tpm_tis_send().

Then I changed the IRQ to threaded with IRQF_ONESHOT, as the handler is
calling into the SPI stack (which can sleep). A related problem exists
in wait_for_tpm_stat(), which uses the potentially sleeping
wait_for_tpm_stat_cond() as the condition check in
wait_event_interruptible_timeout(). This leads to a:
[   41.799086] do not call blocking ops when !TASK_RUNNING; state=1 set at [<f02406bc>] prepare_to_wait_event+0x84/0x1a4
seems to lead to working transfers, but nevertheless needs to be
changed to be compatible with sleeping busses.

So I think this probably only worked on systems which don't use SPI.

What still confuses me, is that after a few successful transfers I keep
getting IRQs with TPM_INTF_CMD_READY_INT set, although the handler
clear that bit. If i read the TIS spec correctly, it should only be
asserted by a 0->1 transition of
TPM_INT_STATUS_x.commandReadyIntOccured flag, which shouldn't happend
so often.

Regards,
Jan


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] tpm_tis_core: Disable broken IRQ handling code
  2020-04-14 17:44   ` Jerry Snitselaar
  2020-04-15  8:57     ` Jan Lübbe
@ 2020-04-16 17:03     ` Jarkko Sakkinen
  1 sibling, 0 replies; 22+ messages in thread
From: Jarkko Sakkinen @ 2020-04-16 17:03 UTC (permalink / raw)
  To: Jerry Snitselaar
  Cc: Jason Gunthorpe, Hans de Goede, Peter Huewe, Stefan Berger,
	Arnd Bergmann, Greg Kroah-Hartman, linux-integrity

On Tue, Apr 14, 2020 at 10:44:33AM -0700, Jerry Snitselaar wrote:
> On Fri Apr 10 20, Jason Gunthorpe wrote:
> > On Thu, Apr 09, 2020 at 11:10:44PM +0200, Hans de Goede wrote:
> > > Since commit dda8b2af395b ("tpm: Revert "tpm_tis_core: Set
> > > TPM_CHIP_FLAG_IRQ before probing for interrupts"") we no longer set
> > > the TPM_CHIP_FLAG_IRQ ever.
> > 
> > This all used to work..
> 
> IIRC this goes all the way back to 570a36097f30 ("tpm: drop 'irq' from struct tpm_vendor_specific")
> when the flag was added. There was never anything initially setting it in the tpm_tis code.
> There were checks added, but the only place it got set was where it did the interrupt test in
> tpm_tis_send, and it can only get down to that code if the flag is set. Stefan's patch was enabling
> the flag, but with the flag enabled some systems were seeing interrupt storms. I believe it has
> been seen with both the t490 and an internal system that Dan Williams was working on at Intel.
> Without access to hw seeing the problem the decision was to revert Stefan's patches while
> we try to figure out the issues.

Thanks Jerry.

Yup, kinda hard to debug irq issues without local access
to the whole platform. They can be nasty issues to debug
in the 1st place.

/Jarkko

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] tpm_tis_core: Disable broken IRQ handling code
  2020-04-10 21:06 ` Jarkko Sakkinen
  2020-04-10 21:09   ` Jarkko Sakkinen
@ 2020-05-07 14:02   ` Hans de Goede
  2020-05-07 14:02   ` Hans de Goede
  2 siblings, 0 replies; 22+ messages in thread
From: Hans de Goede @ 2020-05-07 14:02 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: Peter Huewe, Jason Gunthorpe, Jerry Snitselaar, Stefan Berger,
	Arnd Bergmann, Greg Kroah-Hartman, linux-integrity, Mark Pearson

Hi All,

On 4/10/20 11:06 PM, Jarkko Sakkinen wrote:
> On Thu, Apr 09, 2020 at 11:10:44PM +0200, Hans de Goede wrote:
>> Since commit dda8b2af395b ("tpm: Revert "tpm_tis_core: Set
>> TPM_CHIP_FLAG_IRQ before probing for interrupts"") we no longer set
>> the TPM_CHIP_FLAG_IRQ ever.
>>
>> So the whole IRQ probing code is not useful, worse we rely on the
>> IRQ-test path of tpm_tis_send() to call disable_interrupts() if
>> interrupts do not work, but that path never gets entered because we
>> never set the TPM_CHIP_FLAG_IRQ.
>>
>> So the remaining IRQ probe code calls request_irq() and never calls
>> free_irq() even when the interrupt is not working.
>>
>> On some systems, e.g. the Lenovo X1 8th gen,  the interrupt we try
>> to use and never free creates an interrupt storm followed by
>> an "irq XX: nobody cared" oops.
>>
>> Since it is non-functional at the moment anyways, lets just completely
>> disable the IRQ code in tpm_tis_core for now.
>>
>> Fixes: dda8b2af395b ("tpm: Revert "tpm_tis_core: Set TPM_CHIP_FLAG_IRQ before probing for interrupts"")
>> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
>> ---
>> Note I'm working with Lenovo to try and get to the bottom of this.
>> ---
> 
> OK if I recall correctly the reason for reverting was that the fixes
> Stefan was sending were broken and no access to hardware were the
> issues would be visible. The reason for not doing anything til this
> day is that we don't have T490 available.

So as promised I have been in contact with Lenovo about this.

Specifically I have been in contact with Lenovo about seeing an
IRQ storm when the tpm_tis code tries to use the IRQ on a X1 carbon
8th gen (X1C8), because of the now public plan that Lenovo will
offer ordering this model with Fedora pre-installed:
https://lwn.net/Articles/818595/

On the X1C8 the problem has been diagnosed to be a misconfigured
GPIO pin on the CPU (the SoC). The X1C8 uses an SPI connected
TPM chip with its IRQ connected to a GPIO on the SoC which is
configured in Direct IRQ mode, so that it directly asserts
IRQs on one of the APIC IRQs.  The problem is that due to the
misconfiguration as soon as the IRQ is enabled it fires
continuously.

For the X1C8 this should be fixed in the BIOS of the first
batch which gets shipped out to customers so there we should
not have to worry about this.

It is likely (but not yet confirmed) that the issue on the T490
is the same, although on my test X1C8 device I got an IRQ storm,
followed by the kernel disabling the IRQ, not a non booting system.
I guess this might be due to kernel configuration differences.

Assuming that the issue on the T490 is the same, we might see a
BIOS update fixing this, but given that non-booting is
'not good ("tm")' even if there will be a BIOS fix we should
still do something at the kernel level to also work with the
older unfixed BIOS which is already out there.

I've been thinking about this and I'm afraid that the only thing
what we can do is add a DMI product-name (product-version for Lenovo)
string based blacklist for IRQ usage to drivers/char/tpm/tpm_tis.c
and set tpm_info.irq = -1 for devices on that list.

My plan is to prepare a RFC patch of such a blacklist, while we
wait for confirmation that the root cause on the T490 is the same
as on the X1C8, but before I work on that I'm wondering if
people agree that that is the best approach, or if there are
other suggestions for dealing with this ?

Regards,

Hans


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] tpm_tis_core: Disable broken IRQ handling code
  2020-04-10 21:06 ` Jarkko Sakkinen
  2020-04-10 21:09   ` Jarkko Sakkinen
  2020-05-07 14:02   ` Hans de Goede
@ 2020-05-07 14:02   ` Hans de Goede
  2020-05-07 14:17     ` Jerry Snitselaar
  2020-06-04  9:17     ` Jerry Snitselaar
  2 siblings, 2 replies; 22+ messages in thread
From: Hans de Goede @ 2020-05-07 14:02 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: Peter Huewe, Jason Gunthorpe, Jerry Snitselaar, Stefan Berger,
	Arnd Bergmann, Greg Kroah-Hartman, linux-integrity, Mark Pearson

Hi All,

On 4/10/20 11:06 PM, Jarkko Sakkinen wrote:
> On Thu, Apr 09, 2020 at 11:10:44PM +0200, Hans de Goede wrote:
>> Since commit dda8b2af395b ("tpm: Revert "tpm_tis_core: Set
>> TPM_CHIP_FLAG_IRQ before probing for interrupts"") we no longer set
>> the TPM_CHIP_FLAG_IRQ ever.
>>
>> So the whole IRQ probing code is not useful, worse we rely on the
>> IRQ-test path of tpm_tis_send() to call disable_interrupts() if
>> interrupts do not work, but that path never gets entered because we
>> never set the TPM_CHIP_FLAG_IRQ.
>>
>> So the remaining IRQ probe code calls request_irq() and never calls
>> free_irq() even when the interrupt is not working.
>>
>> On some systems, e.g. the Lenovo X1 8th gen,  the interrupt we try
>> to use and never free creates an interrupt storm followed by
>> an "irq XX: nobody cared" oops.
>>
>> Since it is non-functional at the moment anyways, lets just completely
>> disable the IRQ code in tpm_tis_core for now.
>>
>> Fixes: dda8b2af395b ("tpm: Revert "tpm_tis_core: Set TPM_CHIP_FLAG_IRQ before probing for interrupts"")
>> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
>> ---
>> Note I'm working with Lenovo to try and get to the bottom of this.
>> ---
> 
> OK if I recall correctly the reason for reverting was that the fixes
> Stefan was sending were broken and no access to hardware were the
> issues would be visible. The reason for not doing anything til this
> day is that we don't have T490 available.

So as promised I have been in contact with Lenovo about this.

Specifically I have been in contact with Lenovo about seeing an
IRQ storm when the tpm_tis code tries to use the IRQ on a X1 carbon
8th gen (X1C8), because of the now public plan that Lenovo will
offer ordering this model with Fedora pre-installed:
https://lwn.net/Articles/818595/

On the X1C8 the problem has been diagnosed to be a misconfigured
GPIO pin on the CPU (the SoC). The X1C8 uses an SPI connected
TPM chip with its IRQ connected to a GPIO on the SoC which is
configured in Direct IRQ mode, so that it directly asserts
IRQs on one of the APIC IRQs.  The problem is that due to the
misconfiguration as soon as the IRQ is enabled it fires
continuously.

For the X1C8 this should be fixed in the BIOS of the first
batch which gets shipped out to customers so there we should
not have to worry about this.

It is likely (but not yet confirmed) that the issue on the T490
is the same, although on my test X1C8 device I got an IRQ storm,
followed by the kernel disabling the IRQ, not a non booting system.
I guess this might be due to kernel configuration differences.

Assuming that the issue on the T490 is the same, we might see a
BIOS update fixing this, but given that non-booting is
'not good ("tm")' even if there will be a BIOS fix we should
still do something at the kernel level to also work with the
older unfixed BIOS which is already out there.

I've been thinking about this and I'm afraid that the only thing
what we can do is add a DMI product-name (product-version for Lenovo)
string based blacklist for IRQ usage to drivers/char/tpm/tpm_tis.c
and set tpm_info.irq = -1 for devices on that list.

My plan is to prepare a RFC patch of such a blacklist, while we
wait for confirmation that the root cause on the T490 is the same
as on the X1C8, but before I work on that I'm wondering if
people agree that that is the best approach, or if there are
other suggestions for dealing with this ?

Regards,

Hans


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] tpm_tis_core: Disable broken IRQ handling code
  2020-05-07 14:02   ` Hans de Goede
@ 2020-05-07 14:17     ` Jerry Snitselaar
  2020-05-07 17:38       ` Hans de Goede
  2020-05-07 21:51       ` Dan Williams
  2020-06-04  9:17     ` Jerry Snitselaar
  1 sibling, 2 replies; 22+ messages in thread
From: Jerry Snitselaar @ 2020-05-07 14:17 UTC (permalink / raw)
  To: Hans de Goede
  Cc: Jarkko Sakkinen, Peter Huewe, Jason Gunthorpe, Stefan Berger,
	Arnd Bergmann, Greg Kroah-Hartman, linux-integrity, Mark Pearson,
	Dan Williams

On Thu May 07 20, Hans de Goede wrote:
>Hi All,
>
>On 4/10/20 11:06 PM, Jarkko Sakkinen wrote:
>>On Thu, Apr 09, 2020 at 11:10:44PM +0200, Hans de Goede wrote:
>>>Since commit dda8b2af395b ("tpm: Revert "tpm_tis_core: Set
>>>TPM_CHIP_FLAG_IRQ before probing for interrupts"") we no longer set
>>>the TPM_CHIP_FLAG_IRQ ever.
>>>
>>>So the whole IRQ probing code is not useful, worse we rely on the
>>>IRQ-test path of tpm_tis_send() to call disable_interrupts() if
>>>interrupts do not work, but that path never gets entered because we
>>>never set the TPM_CHIP_FLAG_IRQ.
>>>
>>>So the remaining IRQ probe code calls request_irq() and never calls
>>>free_irq() even when the interrupt is not working.
>>>
>>>On some systems, e.g. the Lenovo X1 8th gen,  the interrupt we try
>>>to use and never free creates an interrupt storm followed by
>>>an "irq XX: nobody cared" oops.
>>>
>>>Since it is non-functional at the moment anyways, lets just completely
>>>disable the IRQ code in tpm_tis_core for now.
>>>
>>>Fixes: dda8b2af395b ("tpm: Revert "tpm_tis_core: Set TPM_CHIP_FLAG_IRQ before probing for interrupts"")
>>>Signed-off-by: Hans de Goede <hdegoede@redhat.com>
>>>---
>>>Note I'm working with Lenovo to try and get to the bottom of this.
>>>---
>>
>>OK if I recall correctly the reason for reverting was that the fixes
>>Stefan was sending were broken and no access to hardware were the
>>issues would be visible. The reason for not doing anything til this
>>day is that we don't have T490 available.
>
>So as promised I have been in contact with Lenovo about this.
>
>Specifically I have been in contact with Lenovo about seeing an
>IRQ storm when the tpm_tis code tries to use the IRQ on a X1 carbon
>8th gen (X1C8), because of the now public plan that Lenovo will
>offer ordering this model with Fedora pre-installed:
>https://lwn.net/Articles/818595/
>
>On the X1C8 the problem has been diagnosed to be a misconfigured
>GPIO pin on the CPU (the SoC). The X1C8 uses an SPI connected
>TPM chip with its IRQ connected to a GPIO on the SoC which is
>configured in Direct IRQ mode, so that it directly asserts
>IRQs on one of the APIC IRQs.  The problem is that due to the
>misconfiguration as soon as the IRQ is enabled it fires
>continuously.
>
>For the X1C8 this should be fixed in the BIOS of the first
>batch which gets shipped out to customers so there we should
>not have to worry about this.
>
>It is likely (but not yet confirmed) that the issue on the T490
>is the same, although on my test X1C8 device I got an IRQ storm,
>followed by the kernel disabling the IRQ, not a non booting system.
>I guess this might be due to kernel configuration differences.
>
>Assuming that the issue on the T490 is the same, we might see a
>BIOS update fixing this, but given that non-booting is
>'not good ("tm")' even if there will be a BIOS fix we should
>still do something at the kernel level to also work with the
>older unfixed BIOS which is already out there.
>
>I've been thinking about this and I'm afraid that the only thing
>what we can do is add a DMI product-name (product-version for Lenovo)
>string based blacklist for IRQ usage to drivers/char/tpm/tpm_tis.c
>and set tpm_info.irq = -1 for devices on that list.
>
>My plan is to prepare a RFC patch of such a blacklist, while we
>wait for confirmation that the root cause on the T490 is the same
>as on the X1C8, but before I work on that I'm wondering if
>people agree that that is the best approach, or if there are
>other suggestions for dealing with this ?
>
>Regards,
>
>Hans
>

Dan,

Could this be the cause of the problem on the system you were
seeing the issue with, or was that using PTT?

Regards,
Jerry


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] tpm_tis_core: Disable broken IRQ handling code
  2020-05-07 14:17     ` Jerry Snitselaar
@ 2020-05-07 17:38       ` Hans de Goede
  2020-05-07 21:51       ` Dan Williams
  1 sibling, 0 replies; 22+ messages in thread
From: Hans de Goede @ 2020-05-07 17:38 UTC (permalink / raw)
  To: Jerry Snitselaar
  Cc: Jarkko Sakkinen, Peter Huewe, Jason Gunthorpe, Stefan Berger,
	Arnd Bergmann, Greg Kroah-Hartman, linux-integrity, Mark Pearson,
	Dan Williams

Hi,

On 5/7/20 4:17 PM, Jerry Snitselaar wrote:
> On Thu May 07 20, Hans de Goede wrote:
>> Hi All,
>>
>> On 4/10/20 11:06 PM, Jarkko Sakkinen wrote:
>>> On Thu, Apr 09, 2020 at 11:10:44PM +0200, Hans de Goede wrote:
>>>> Since commit dda8b2af395b ("tpm: Revert "tpm_tis_core: Set
>>>> TPM_CHIP_FLAG_IRQ before probing for interrupts"") we no longer set
>>>> the TPM_CHIP_FLAG_IRQ ever.
>>>>
>>>> So the whole IRQ probing code is not useful, worse we rely on the
>>>> IRQ-test path of tpm_tis_send() to call disable_interrupts() if
>>>> interrupts do not work, but that path never gets entered because we
>>>> never set the TPM_CHIP_FLAG_IRQ.
>>>>
>>>> So the remaining IRQ probe code calls request_irq() and never calls
>>>> free_irq() even when the interrupt is not working.
>>>>
>>>> On some systems, e.g. the Lenovo X1 8th gen,  the interrupt we try
>>>> to use and never free creates an interrupt storm followed by
>>>> an "irq XX: nobody cared" oops.
>>>>
>>>> Since it is non-functional at the moment anyways, lets just completely
>>>> disable the IRQ code in tpm_tis_core for now.
>>>>
>>>> Fixes: dda8b2af395b ("tpm: Revert "tpm_tis_core: Set TPM_CHIP_FLAG_IRQ before probing for interrupts"")
>>>> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
>>>> ---
>>>> Note I'm working with Lenovo to try and get to the bottom of this.
>>>> ---
>>>
>>> OK if I recall correctly the reason for reverting was that the fixes
>>> Stefan was sending were broken and no access to hardware were the
>>> issues would be visible. The reason for not doing anything til this
>>> day is that we don't have T490 available.
>>
>> So as promised I have been in contact with Lenovo about this.
>>
>> Specifically I have been in contact with Lenovo about seeing an
>> IRQ storm when the tpm_tis code tries to use the IRQ on a X1 carbon
>> 8th gen (X1C8), because of the now public plan that Lenovo will
>> offer ordering this model with Fedora pre-installed:
>> https://lwn.net/Articles/818595/
>>
>> On the X1C8 the problem has been diagnosed to be a misconfigured
>> GPIO pin on the CPU (the SoC). The X1C8 uses an SPI connected
>> TPM chip with its IRQ connected to a GPIO on the SoC which is
>> configured in Direct IRQ mode, so that it directly asserts
>> IRQs on one of the APIC IRQs.  The problem is that due to the
>> misconfiguration as soon as the IRQ is enabled it fires
>> continuously.
>>
>> For the X1C8 this should be fixed in the BIOS of the first
>> batch which gets shipped out to customers so there we should
>> not have to worry about this.
>>
>> It is likely (but not yet confirmed) that the issue on the T490
>> is the same, although on my test X1C8 device I got an IRQ storm,
>> followed by the kernel disabling the IRQ, not a non booting system.
>> I guess this might be due to kernel configuration differences.
>>
>> Assuming that the issue on the T490 is the same, we might see a
>> BIOS update fixing this, but given that non-booting is
>> 'not good ("tm")' even if there will be a BIOS fix we should
>> still do something at the kernel level to also work with the
>> older unfixed BIOS which is already out there.
>>
>> I've been thinking about this and I'm afraid that the only thing
>> what we can do is add a DMI product-name (product-version for Lenovo)
>> string based blacklist for IRQ usage to drivers/char/tpm/tpm_tis.c
>> and set tpm_info.irq = -1 for devices on that list.
>>
>> My plan is to prepare a RFC patch of such a blacklist, while we
>> wait for confirmation that the root cause on the T490 is the same
>> as on the X1C8, but before I work on that I'm wondering if
>> people agree that that is the best approach, or if there are
>> other suggestions for dealing with this ?
>>
>> Regards,
>>
>> Hans
>>
> 
> Dan,
> 
> Could this be the cause of the problem on the system you were
> seeing the issue with, or was that using PTT?

The system I was seeing the issue on was the X1C8, so yes this
is (was with updated BIOS) the cause of the issue I was seeing.

Regards,

Hans


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] tpm_tis_core: Disable broken IRQ handling code
  2020-05-07 14:17     ` Jerry Snitselaar
  2020-05-07 17:38       ` Hans de Goede
@ 2020-05-07 21:51       ` Dan Williams
  2020-05-07 23:13         ` Jerry Snitselaar
                           ` (2 more replies)
  1 sibling, 3 replies; 22+ messages in thread
From: Dan Williams @ 2020-05-07 21:51 UTC (permalink / raw)
  To: Jerry Snitselaar
  Cc: Hans de Goede, Jarkko Sakkinen, Peter Huewe, Jason Gunthorpe,
	Stefan Berger, Arnd Bergmann, Greg Kroah-Hartman,
	linux-integrity, Mark Pearson

On Thu, May 7, 2020 at 7:17 AM Jerry Snitselaar <jsnitsel@redhat.com> wrote:
>
> On Thu May 07 20, Hans de Goede wrote:
> >Hi All,
> >
> >On 4/10/20 11:06 PM, Jarkko Sakkinen wrote:
> >>On Thu, Apr 09, 2020 at 11:10:44PM +0200, Hans de Goede wrote:
> >>>Since commit dda8b2af395b ("tpm: Revert "tpm_tis_core: Set
> >>>TPM_CHIP_FLAG_IRQ before probing for interrupts"") we no longer set
> >>>the TPM_CHIP_FLAG_IRQ ever.
> >>>
> >>>So the whole IRQ probing code is not useful, worse we rely on the
> >>>IRQ-test path of tpm_tis_send() to call disable_interrupts() if
> >>>interrupts do not work, but that path never gets entered because we
> >>>never set the TPM_CHIP_FLAG_IRQ.
> >>>
> >>>So the remaining IRQ probe code calls request_irq() and never calls
> >>>free_irq() even when the interrupt is not working.
> >>>
> >>>On some systems, e.g. the Lenovo X1 8th gen,  the interrupt we try
> >>>to use and never free creates an interrupt storm followed by
> >>>an "irq XX: nobody cared" oops.
> >>>
> >>>Since it is non-functional at the moment anyways, lets just completely
> >>>disable the IRQ code in tpm_tis_core for now.
> >>>
> >>>Fixes: dda8b2af395b ("tpm: Revert "tpm_tis_core: Set TPM_CHIP_FLAG_IRQ before probing for interrupts"")
> >>>Signed-off-by: Hans de Goede <hdegoede@redhat.com>
> >>>---
> >>>Note I'm working with Lenovo to try and get to the bottom of this.
> >>>---
> >>
> >>OK if I recall correctly the reason for reverting was that the fixes
> >>Stefan was sending were broken and no access to hardware were the
> >>issues would be visible. The reason for not doing anything til this
> >>day is that we don't have T490 available.
> >
> >So as promised I have been in contact with Lenovo about this.
> >
> >Specifically I have been in contact with Lenovo about seeing an
> >IRQ storm when the tpm_tis code tries to use the IRQ on a X1 carbon
> >8th gen (X1C8), because of the now public plan that Lenovo will
> >offer ordering this model with Fedora pre-installed:
> >https://lwn.net/Articles/818595/
> >
> >On the X1C8 the problem has been diagnosed to be a misconfigured
> >GPIO pin on the CPU (the SoC). The X1C8 uses an SPI connected
> >TPM chip with its IRQ connected to a GPIO on the SoC which is
> >configured in Direct IRQ mode, so that it directly asserts
> >IRQs on one of the APIC IRQs.  The problem is that due to the
> >misconfiguration as soon as the IRQ is enabled it fires
> >continuously.
> >
> >For the X1C8 this should be fixed in the BIOS of the first
> >batch which gets shipped out to customers so there we should
> >not have to worry about this.
> >
> >It is likely (but not yet confirmed) that the issue on the T490
> >is the same, although on my test X1C8 device I got an IRQ storm,
> >followed by the kernel disabling the IRQ, not a non booting system.
> >I guess this might be due to kernel configuration differences.
> >
> >Assuming that the issue on the T490 is the same, we might see a
> >BIOS update fixing this, but given that non-booting is
> >'not good ("tm")' even if there will be a BIOS fix we should
> >still do something at the kernel level to also work with the
> >older unfixed BIOS which is already out there.
> >
> >I've been thinking about this and I'm afraid that the only thing
> >what we can do is add a DMI product-name (product-version for Lenovo)
> >string based blacklist for IRQ usage to drivers/char/tpm/tpm_tis.c
> >and set tpm_info.irq = -1 for devices on that list.
> >
> >My plan is to prepare a RFC patch of such a blacklist, while we
> >wait for confirmation that the root cause on the T490 is the same
> >as on the X1C8, but before I work on that I'm wondering if
> >people agree that that is the best approach, or if there are
> >other suggestions for dealing with this ?
> >
> >Regards,
> >
> >Hans
> >
>
> Dan,
>
> Could this be the cause of the problem on the system you were
> seeing the issue with, or was that using PTT?

It sounds similar, I'm just not immediately aware of where I can find
out how the GPIOs are routed on that development board. I'll poke
around.

What's PTT?

My concern with a blacklist is that the existing tpm_tis module
parameter to disable interrupts, IIRC, did not help mitigate this
problem. So I would think that if there is a blackilst it should at
least be amenable by module parameter for new platforms, or that
specifying "interrupts=0" to tpm_tis.ko behaves identically to the
device being placed on the list.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] tpm_tis_core: Disable broken IRQ handling code
  2020-05-07 21:51       ` Dan Williams
@ 2020-05-07 23:13         ` Jerry Snitselaar
  2020-05-08  7:45         ` Hans de Goede
  2020-05-13 21:31         ` Jarkko Sakkinen
  2 siblings, 0 replies; 22+ messages in thread
From: Jerry Snitselaar @ 2020-05-07 23:13 UTC (permalink / raw)
  To: Dan Williams
  Cc: Hans de Goede, Jarkko Sakkinen, Peter Huewe, Jason Gunthorpe,
	Stefan Berger, Arnd Bergmann, Greg Kroah-Hartman,
	linux-integrity, Mark Pearson

On Thu May 07 20, Dan Williams wrote:
>On Thu, May 7, 2020 at 7:17 AM Jerry Snitselaar <jsnitsel@redhat.com> wrote:
>>
>> On Thu May 07 20, Hans de Goede wrote:
>> >Hi All,
>> >
>> >On 4/10/20 11:06 PM, Jarkko Sakkinen wrote:
>> >>On Thu, Apr 09, 2020 at 11:10:44PM +0200, Hans de Goede wrote:
>> >>>Since commit dda8b2af395b ("tpm: Revert "tpm_tis_core: Set
>> >>>TPM_CHIP_FLAG_IRQ before probing for interrupts"") we no longer set
>> >>>the TPM_CHIP_FLAG_IRQ ever.
>> >>>
>> >>>So the whole IRQ probing code is not useful, worse we rely on the
>> >>>IRQ-test path of tpm_tis_send() to call disable_interrupts() if
>> >>>interrupts do not work, but that path never gets entered because we
>> >>>never set the TPM_CHIP_FLAG_IRQ.
>> >>>
>> >>>So the remaining IRQ probe code calls request_irq() and never calls
>> >>>free_irq() even when the interrupt is not working.
>> >>>
>> >>>On some systems, e.g. the Lenovo X1 8th gen,  the interrupt we try
>> >>>to use and never free creates an interrupt storm followed by
>> >>>an "irq XX: nobody cared" oops.
>> >>>
>> >>>Since it is non-functional at the moment anyways, lets just completely
>> >>>disable the IRQ code in tpm_tis_core for now.
>> >>>
>> >>>Fixes: dda8b2af395b ("tpm: Revert "tpm_tis_core: Set TPM_CHIP_FLAG_IRQ before probing for interrupts"")
>> >>>Signed-off-by: Hans de Goede <hdegoede@redhat.com>
>> >>>---
>> >>>Note I'm working with Lenovo to try and get to the bottom of this.
>> >>>---
>> >>
>> >>OK if I recall correctly the reason for reverting was that the fixes
>> >>Stefan was sending were broken and no access to hardware were the
>> >>issues would be visible. The reason for not doing anything til this
>> >>day is that we don't have T490 available.
>> >
>> >So as promised I have been in contact with Lenovo about this.
>> >
>> >Specifically I have been in contact with Lenovo about seeing an
>> >IRQ storm when the tpm_tis code tries to use the IRQ on a X1 carbon
>> >8th gen (X1C8), because of the now public plan that Lenovo will
>> >offer ordering this model with Fedora pre-installed:
>> >https://lwn.net/Articles/818595/
>> >
>> >On the X1C8 the problem has been diagnosed to be a misconfigured
>> >GPIO pin on the CPU (the SoC). The X1C8 uses an SPI connected
>> >TPM chip with its IRQ connected to a GPIO on the SoC which is
>> >configured in Direct IRQ mode, so that it directly asserts
>> >IRQs on one of the APIC IRQs.  The problem is that due to the
>> >misconfiguration as soon as the IRQ is enabled it fires
>> >continuously.
>> >
>> >For the X1C8 this should be fixed in the BIOS of the first
>> >batch which gets shipped out to customers so there we should
>> >not have to worry about this.
>> >
>> >It is likely (but not yet confirmed) that the issue on the T490
>> >is the same, although on my test X1C8 device I got an IRQ storm,
>> >followed by the kernel disabling the IRQ, not a non booting system.
>> >I guess this might be due to kernel configuration differences.
>> >
>> >Assuming that the issue on the T490 is the same, we might see a
>> >BIOS update fixing this, but given that non-booting is
>> >'not good ("tm")' even if there will be a BIOS fix we should
>> >still do something at the kernel level to also work with the
>> >older unfixed BIOS which is already out there.
>> >
>> >I've been thinking about this and I'm afraid that the only thing
>> >what we can do is add a DMI product-name (product-version for Lenovo)
>> >string based blacklist for IRQ usage to drivers/char/tpm/tpm_tis.c
>> >and set tpm_info.irq = -1 for devices on that list.
>> >
>> >My plan is to prepare a RFC patch of such a blacklist, while we
>> >wait for confirmation that the root cause on the T490 is the same
>> >as on the X1C8, but before I work on that I'm wondering if
>> >people agree that that is the best approach, or if there are
>> >other suggestions for dealing with this ?
>> >
>> >Regards,
>> >
>> >Hans
>> >
>>
>> Dan,
>>
>> Could this be the cause of the problem on the system you were
>> seeing the issue with, or was that using PTT?
>
>It sounds similar, I'm just not immediately aware of where I can find
>out how the GPIOs are routed on that development board. I'll poke
>around.
>
>What's PTT?
>

PTT is Intel's firmware based TPM.

>My concern with a blacklist is that the existing tpm_tis module
>parameter to disable interrupts, IIRC, did not help mitigate this
>problem. So I would think that if there is a blackilst it should at
>least be amenable by module parameter for new platforms, or that
>specifying "interrupts=0" to tpm_tis.ko behaves identically to the
>device being placed on the list.
>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] tpm_tis_core: Disable broken IRQ handling code
  2020-05-07 21:51       ` Dan Williams
  2020-05-07 23:13         ` Jerry Snitselaar
@ 2020-05-08  7:45         ` Hans de Goede
  2020-05-13 21:31         ` Jarkko Sakkinen
  2 siblings, 0 replies; 22+ messages in thread
From: Hans de Goede @ 2020-05-08  7:45 UTC (permalink / raw)
  To: Dan Williams, Jerry Snitselaar
  Cc: Jarkko Sakkinen, Peter Huewe, Jason Gunthorpe, Stefan Berger,
	Arnd Bergmann, Greg Kroah-Hartman, linux-integrity, Mark Pearson

Hi,

On 5/7/20 11:51 PM, Dan Williams wrote:
<snip>

> My concern with a blacklist is that the existing tpm_tis module
> parameter to disable interrupts, IIRC, did not help mitigate this
> problem. So I would think that if there is a blackilst it should at
> least be amenable by module parameter for new platforms, or that
> specifying "interrupts=0" to tpm_tis.ko behaves identically to the
> device being placed on the list.

Are you sure that disabling the interrupt through the kernel commandline
did not help on the T490? The reverted patches all have to do with
the irq-probing / irq-handling and if tpm_tis_core_init gets passed
an irq of -1 then all of that should be skipped?

Regards,

Hans


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] tpm_tis_core: Disable broken IRQ handling code
  2020-05-07 21:51       ` Dan Williams
  2020-05-07 23:13         ` Jerry Snitselaar
  2020-05-08  7:45         ` Hans de Goede
@ 2020-05-13 21:31         ` Jarkko Sakkinen
  2 siblings, 0 replies; 22+ messages in thread
From: Jarkko Sakkinen @ 2020-05-13 21:31 UTC (permalink / raw)
  To: Dan Williams
  Cc: Jerry Snitselaar, Hans de Goede, Peter Huewe, Jason Gunthorpe,
	Stefan Berger, Arnd Bergmann, Greg Kroah-Hartman,
	linux-integrity, Mark Pearson

On Thu, May 07, 2020 at 02:51:43PM -0700, Dan Williams wrote:
> On Thu, May 7, 2020 at 7:17 AM Jerry Snitselaar <jsnitsel@redhat.com> wrote:
> >
> > On Thu May 07 20, Hans de Goede wrote:
> > >Hi All,
> > >
> > >On 4/10/20 11:06 PM, Jarkko Sakkinen wrote:
> > >>On Thu, Apr 09, 2020 at 11:10:44PM +0200, Hans de Goede wrote:
> > >>>Since commit dda8b2af395b ("tpm: Revert "tpm_tis_core: Set
> > >>>TPM_CHIP_FLAG_IRQ before probing for interrupts"") we no longer set
> > >>>the TPM_CHIP_FLAG_IRQ ever.
> > >>>
> > >>>So the whole IRQ probing code is not useful, worse we rely on the
> > >>>IRQ-test path of tpm_tis_send() to call disable_interrupts() if
> > >>>interrupts do not work, but that path never gets entered because we
> > >>>never set the TPM_CHIP_FLAG_IRQ.
> > >>>
> > >>>So the remaining IRQ probe code calls request_irq() and never calls
> > >>>free_irq() even when the interrupt is not working.
> > >>>
> > >>>On some systems, e.g. the Lenovo X1 8th gen,  the interrupt we try
> > >>>to use and never free creates an interrupt storm followed by
> > >>>an "irq XX: nobody cared" oops.
> > >>>
> > >>>Since it is non-functional at the moment anyways, lets just completely
> > >>>disable the IRQ code in tpm_tis_core for now.
> > >>>
> > >>>Fixes: dda8b2af395b ("tpm: Revert "tpm_tis_core: Set TPM_CHIP_FLAG_IRQ before probing for interrupts"")
> > >>>Signed-off-by: Hans de Goede <hdegoede@redhat.com>
> > >>>---
> > >>>Note I'm working with Lenovo to try and get to the bottom of this.
> > >>>---
> > >>
> > >>OK if I recall correctly the reason for reverting was that the fixes
> > >>Stefan was sending were broken and no access to hardware were the
> > >>issues would be visible. The reason for not doing anything til this
> > >>day is that we don't have T490 available.
> > >
> > >So as promised I have been in contact with Lenovo about this.
> > >
> > >Specifically I have been in contact with Lenovo about seeing an
> > >IRQ storm when the tpm_tis code tries to use the IRQ on a X1 carbon
> > >8th gen (X1C8), because of the now public plan that Lenovo will
> > >offer ordering this model with Fedora pre-installed:
> > >https://lwn.net/Articles/818595/
> > >
> > >On the X1C8 the problem has been diagnosed to be a misconfigured
> > >GPIO pin on the CPU (the SoC). The X1C8 uses an SPI connected
> > >TPM chip with its IRQ connected to a GPIO on the SoC which is
> > >configured in Direct IRQ mode, so that it directly asserts
> > >IRQs on one of the APIC IRQs.  The problem is that due to the
> > >misconfiguration as soon as the IRQ is enabled it fires
> > >continuously.
> > >
> > >For the X1C8 this should be fixed in the BIOS of the first
> > >batch which gets shipped out to customers so there we should
> > >not have to worry about this.
> > >
> > >It is likely (but not yet confirmed) that the issue on the T490
> > >is the same, although on my test X1C8 device I got an IRQ storm,
> > >followed by the kernel disabling the IRQ, not a non booting system.
> > >I guess this might be due to kernel configuration differences.
> > >
> > >Assuming that the issue on the T490 is the same, we might see a
> > >BIOS update fixing this, but given that non-booting is
> > >'not good ("tm")' even if there will be a BIOS fix we should
> > >still do something at the kernel level to also work with the
> > >older unfixed BIOS which is already out there.
> > >
> > >I've been thinking about this and I'm afraid that the only thing
> > >what we can do is add a DMI product-name (product-version for Lenovo)
> > >string based blacklist for IRQ usage to drivers/char/tpm/tpm_tis.c
> > >and set tpm_info.irq = -1 for devices on that list.
> > >
> > >My plan is to prepare a RFC patch of such a blacklist, while we
> > >wait for confirmation that the root cause on the T490 is the same
> > >as on the X1C8, but before I work on that I'm wondering if
> > >people agree that that is the best approach, or if there are
> > >other suggestions for dealing with this ?
> > >
> > >Regards,
> > >
> > >Hans
> > >
> >
> > Dan,
> >
> > Could this be the cause of the problem on the system you were
> > seeing the issue with, or was that using PTT?
> 
> It sounds similar, I'm just not immediately aware of where I can find
> out how the GPIOs are routed on that development board. I'll poke
> around.
> 
> What's PTT?

Intel fTPM.

/Jarkko

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] tpm_tis_core: Disable broken IRQ handling code
  2020-05-07 14:02   ` Hans de Goede
  2020-05-07 14:17     ` Jerry Snitselaar
@ 2020-06-04  9:17     ` Jerry Snitselaar
  2020-06-15 22:06       ` Jarkko Sakkinen
  2020-06-16  8:24       ` Hans de Goede
  1 sibling, 2 replies; 22+ messages in thread
From: Jerry Snitselaar @ 2020-06-04  9:17 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: Hans de Goede, Peter Huewe, Jason Gunthorpe, Stefan Berger,
	Arnd Bergmann, Greg Kroah-Hartman, linux-integrity, Mark Pearson

On Thu May 07 20, Hans de Goede wrote:
>Hi All,
>
>On 4/10/20 11:06 PM, Jarkko Sakkinen wrote:
>>On Thu, Apr 09, 2020 at 11:10:44PM +0200, Hans de Goede wrote:
>>>Since commit dda8b2af395b ("tpm: Revert "tpm_tis_core: Set
>>>TPM_CHIP_FLAG_IRQ before probing for interrupts"") we no longer set
>>>the TPM_CHIP_FLAG_IRQ ever.
>>>
>>>So the whole IRQ probing code is not useful, worse we rely on the
>>>IRQ-test path of tpm_tis_send() to call disable_interrupts() if
>>>interrupts do not work, but that path never gets entered because we
>>>never set the TPM_CHIP_FLAG_IRQ.
>>>
>>>So the remaining IRQ probe code calls request_irq() and never calls
>>>free_irq() even when the interrupt is not working.
>>>
>>>On some systems, e.g. the Lenovo X1 8th gen,  the interrupt we try
>>>to use and never free creates an interrupt storm followed by
>>>an "irq XX: nobody cared" oops.
>>>
>>>Since it is non-functional at the moment anyways, lets just completely
>>>disable the IRQ code in tpm_tis_core for now.
>>>
>>>Fixes: dda8b2af395b ("tpm: Revert "tpm_tis_core: Set TPM_CHIP_FLAG_IRQ before probing for interrupts"")
>>>Signed-off-by: Hans de Goede <hdegoede@redhat.com>
>>>---
>>>Note I'm working with Lenovo to try and get to the bottom of this.
>>>---
>>
>>OK if I recall correctly the reason for reverting was that the fixes
>>Stefan was sending were broken and no access to hardware were the
>>issues would be visible. The reason for not doing anything til this
>>day is that we don't have T490 available.
>
>So as promised I have been in contact with Lenovo about this.
>
>Specifically I have been in contact with Lenovo about seeing an
>IRQ storm when the tpm_tis code tries to use the IRQ on a X1 carbon
>8th gen (X1C8), because of the now public plan that Lenovo will
>offer ordering this model with Fedora pre-installed:
>https://lwn.net/Articles/818595/
>
>On the X1C8 the problem has been diagnosed to be a misconfigured
>GPIO pin on the CPU (the SoC). The X1C8 uses an SPI connected
>TPM chip with its IRQ connected to a GPIO on the SoC which is
>configured in Direct IRQ mode, so that it directly asserts
>IRQs on one of the APIC IRQs.  The problem is that due to the
>misconfiguration as soon as the IRQ is enabled it fires
>continuously.
>
>For the X1C8 this should be fixed in the BIOS of the first
>batch which gets shipped out to customers so there we should
>not have to worry about this.
>
>It is likely (but not yet confirmed) that the issue on the T490
>is the same, although on my test X1C8 device I got an IRQ storm,
>followed by the kernel disabling the IRQ, not a non booting system.
>I guess this might be due to kernel configuration differences.
>
>Assuming that the issue on the T490 is the same, we might see a
>BIOS update fixing this, but given that non-booting is
>'not good ("tm")' even if there will be a BIOS fix we should
>still do something at the kernel level to also work with the
>older unfixed BIOS which is already out there.
>
>I've been thinking about this and I'm afraid that the only thing
>what we can do is add a DMI product-name (product-version for Lenovo)
>string based blacklist for IRQ usage to drivers/char/tpm/tpm_tis.c
>and set tpm_info.irq = -1 for devices on that list.
>
>My plan is to prepare a RFC patch of such a blacklist, while we
>wait for confirmation that the root cause on the T490 is the same
>as on the X1C8, but before I work on that I'm wondering if
>people agree that that is the best approach, or if there are
>other suggestions for dealing with this ?
>
>Regards,
>
>Hans
>

Jarrko,

Any thoughts about how we should move forward on dealing with this?
I've got a report about the original problem Stefan was dealing with,
where the tpm isn't powered on when it tries to send a command to
generate an interrupt. The tpm is functioning so it isn't urgent,
but it would be good to get this cleaned up so users aren't getting
transmit errors and firmware bug messages. Hans did you make any
progress on the blacklist patch?


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] tpm_tis_core: Disable broken IRQ handling code
  2020-06-04  9:17     ` Jerry Snitselaar
@ 2020-06-15 22:06       ` Jarkko Sakkinen
  2020-06-16  8:24       ` Hans de Goede
  1 sibling, 0 replies; 22+ messages in thread
From: Jarkko Sakkinen @ 2020-06-15 22:06 UTC (permalink / raw)
  To: Jerry Snitselaar
  Cc: Hans de Goede, Peter Huewe, Jason Gunthorpe, Stefan Berger,
	Arnd Bergmann, Greg Kroah-Hartman, linux-integrity, Mark Pearson

On Thu, Jun 04, 2020 at 02:17:12AM -0700, Jerry Snitselaar wrote:
> On Thu May 07 20, Hans de Goede wrote:
> > Hi All,
> > 
> > On 4/10/20 11:06 PM, Jarkko Sakkinen wrote:
> > > On Thu, Apr 09, 2020 at 11:10:44PM +0200, Hans de Goede wrote:
> > > > Since commit dda8b2af395b ("tpm: Revert "tpm_tis_core: Set
> > > > TPM_CHIP_FLAG_IRQ before probing for interrupts"") we no longer set
> > > > the TPM_CHIP_FLAG_IRQ ever.
> > > > 
> > > > So the whole IRQ probing code is not useful, worse we rely on the
> > > > IRQ-test path of tpm_tis_send() to call disable_interrupts() if
> > > > interrupts do not work, but that path never gets entered because we
> > > > never set the TPM_CHIP_FLAG_IRQ.
> > > > 
> > > > So the remaining IRQ probe code calls request_irq() and never calls
> > > > free_irq() even when the interrupt is not working.
> > > > 
> > > > On some systems, e.g. the Lenovo X1 8th gen,  the interrupt we try
> > > > to use and never free creates an interrupt storm followed by
> > > > an "irq XX: nobody cared" oops.
> > > > 
> > > > Since it is non-functional at the moment anyways, lets just completely
> > > > disable the IRQ code in tpm_tis_core for now.
> > > > 
> > > > Fixes: dda8b2af395b ("tpm: Revert "tpm_tis_core: Set TPM_CHIP_FLAG_IRQ before probing for interrupts"")
> > > > Signed-off-by: Hans de Goede <hdegoede@redhat.com>
> > > > ---
> > > > Note I'm working with Lenovo to try and get to the bottom of this.
> > > > ---
> > > 
> > > OK if I recall correctly the reason for reverting was that the fixes
> > > Stefan was sending were broken and no access to hardware were the
> > > issues would be visible. The reason for not doing anything til this
> > > day is that we don't have T490 available.
> > 
> > So as promised I have been in contact with Lenovo about this.
> > 
> > Specifically I have been in contact with Lenovo about seeing an
> > IRQ storm when the tpm_tis code tries to use the IRQ on a X1 carbon
> > 8th gen (X1C8), because of the now public plan that Lenovo will
> > offer ordering this model with Fedora pre-installed:
> > https://lwn.net/Articles/818595/
> > 
> > On the X1C8 the problem has been diagnosed to be a misconfigured
> > GPIO pin on the CPU (the SoC). The X1C8 uses an SPI connected
> > TPM chip with its IRQ connected to a GPIO on the SoC which is
> > configured in Direct IRQ mode, so that it directly asserts
> > IRQs on one of the APIC IRQs.  The problem is that due to the
> > misconfiguration as soon as the IRQ is enabled it fires
> > continuously.
> > 
> > For the X1C8 this should be fixed in the BIOS of the first
> > batch which gets shipped out to customers so there we should
> > not have to worry about this.
> > 
> > It is likely (but not yet confirmed) that the issue on the T490
> > is the same, although on my test X1C8 device I got an IRQ storm,
> > followed by the kernel disabling the IRQ, not a non booting system.
> > I guess this might be due to kernel configuration differences.
> > 
> > Assuming that the issue on the T490 is the same, we might see a
> > BIOS update fixing this, but given that non-booting is
> > 'not good ("tm")' even if there will be a BIOS fix we should
> > still do something at the kernel level to also work with the
> > older unfixed BIOS which is already out there.
> > 
> > I've been thinking about this and I'm afraid that the only thing
> > what we can do is add a DMI product-name (product-version for Lenovo)
> > string based blacklist for IRQ usage to drivers/char/tpm/tpm_tis.c
> > and set tpm_info.irq = -1 for devices on that list.
> > 
> > My plan is to prepare a RFC patch of such a blacklist, while we
> > wait for confirmation that the root cause on the T490 is the same
> > as on the X1C8, but before I work on that I'm wondering if
> > people agree that that is the best approach, or if there are
> > other suggestions for dealing with this ?
> > 
> > Regards,
> > 
> > Hans
> > 
> 
> Jarrko,
> 
> Any thoughts about how we should move forward on dealing with this?
> I've got a report about the original problem Stefan was dealing with,
> where the tpm isn't powered on when it tries to send a command to
> generate an interrupt. The tpm is functioning so it isn't urgent,
> but it would be good to get this cleaned up so users aren't getting
> transmit errors and firmware bug messages. Hans did you make any
> progress on the blacklist patch?

We need the blacklist. Up until that warnings is the best we can do.

/Jarkko

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] tpm_tis_core: Disable broken IRQ handling code
  2020-06-04  9:17     ` Jerry Snitselaar
  2020-06-15 22:06       ` Jarkko Sakkinen
@ 2020-06-16  8:24       ` Hans de Goede
  2020-06-17 22:56         ` Jarkko Sakkinen
  1 sibling, 1 reply; 22+ messages in thread
From: Hans de Goede @ 2020-06-16  8:24 UTC (permalink / raw)
  To: Jerry Snitselaar, Jarkko Sakkinen
  Cc: Peter Huewe, Jason Gunthorpe, Stefan Berger, Arnd Bergmann,
	Greg Kroah-Hartman, linux-integrity, Mark Pearson

Hi,

On 6/4/20 11:17 AM, Jerry Snitselaar wrote:
> On Thu May 07 20, Hans de Goede wrote:
>> Hi All,
>>
>> On 4/10/20 11:06 PM, Jarkko Sakkinen wrote:
>>> On Thu, Apr 09, 2020 at 11:10:44PM +0200, Hans de Goede wrote:
>>>> Since commit dda8b2af395b ("tpm: Revert "tpm_tis_core: Set
>>>> TPM_CHIP_FLAG_IRQ before probing for interrupts"") we no longer set
>>>> the TPM_CHIP_FLAG_IRQ ever.
>>>>
>>>> So the whole IRQ probing code is not useful, worse we rely on the
>>>> IRQ-test path of tpm_tis_send() to call disable_interrupts() if
>>>> interrupts do not work, but that path never gets entered because we
>>>> never set the TPM_CHIP_FLAG_IRQ.
>>>>
>>>> So the remaining IRQ probe code calls request_irq() and never calls
>>>> free_irq() even when the interrupt is not working.
>>>>
>>>> On some systems, e.g. the Lenovo X1 8th gen,  the interrupt we try
>>>> to use and never free creates an interrupt storm followed by
>>>> an "irq XX: nobody cared" oops.
>>>>
>>>> Since it is non-functional at the moment anyways, lets just completely
>>>> disable the IRQ code in tpm_tis_core for now.
>>>>
>>>> Fixes: dda8b2af395b ("tpm: Revert "tpm_tis_core: Set TPM_CHIP_FLAG_IRQ before probing for interrupts"")
>>>> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
>>>> ---
>>>> Note I'm working with Lenovo to try and get to the bottom of this.
>>>> ---
>>>
>>> OK if I recall correctly the reason for reverting was that the fixes
>>> Stefan was sending were broken and no access to hardware were the
>>> issues would be visible. The reason for not doing anything til this
>>> day is that we don't have T490 available.
>>
>> So as promised I have been in contact with Lenovo about this.
>>
>> Specifically I have been in contact with Lenovo about seeing an
>> IRQ storm when the tpm_tis code tries to use the IRQ on a X1 carbon
>> 8th gen (X1C8), because of the now public plan that Lenovo will
>> offer ordering this model with Fedora pre-installed:
>> https://lwn.net/Articles/818595/
>>
>> On the X1C8 the problem has been diagnosed to be a misconfigured
>> GPIO pin on the CPU (the SoC). The X1C8 uses an SPI connected
>> TPM chip with its IRQ connected to a GPIO on the SoC which is
>> configured in Direct IRQ mode, so that it directly asserts
>> IRQs on one of the APIC IRQs.  The problem is that due to the
>> misconfiguration as soon as the IRQ is enabled it fires
>> continuously.
>>
>> For the X1C8 this should be fixed in the BIOS of the first
>> batch which gets shipped out to customers so there we should
>> not have to worry about this.
>>
>> It is likely (but not yet confirmed) that the issue on the T490
>> is the same, although on my test X1C8 device I got an IRQ storm,
>> followed by the kernel disabling the IRQ, not a non booting system.
>> I guess this might be due to kernel configuration differences.
>>
>> Assuming that the issue on the T490 is the same, we might see a
>> BIOS update fixing this, but given that non-booting is
>> 'not good ("tm")' even if there will be a BIOS fix we should
>> still do something at the kernel level to also work with the
>> older unfixed BIOS which is already out there.
>>
>> I've been thinking about this and I'm afraid that the only thing
>> what we can do is add a DMI product-name (product-version for Lenovo)
>> string based blacklist for IRQ usage to drivers/char/tpm/tpm_tis.c
>> and set tpm_info.irq = -1 for devices on that list.
>>
>> My plan is to prepare a RFC patch of such a blacklist, while we
>> wait for confirmation that the root cause on the T490 is the same
>> as on the X1C8, but before I work on that I'm wondering if
>> people agree that that is the best approach, or if there are
>> other suggestions for dealing with this ?
>>
>> Regards,
>>
>> Hans
>>
> 
> Jarrko,
> 
> Any thoughts about how we should move forward on dealing with this?
> I've got a report about the original problem Stefan was dealing with,
> where the tpm isn't powered on when it tries to send a command to
> generate an interrupt. The tpm is functioning so it isn't urgent,
> but it would be good to get this cleaned up so users aren't getting
> transmit errors and firmware bug messages. Hans did you make any
> progress on the blacklist patch?

Not really. I wanted to confirm on the X1C8 which I have on loan
from Lenovo that it indeed was the GPIO pin misconfiguration which
was the issue. But even though the Lenovo BIOS team claimed that they
have fixed the root cause now, the workaround of not listing any
IRQ for the TPM is also still in place in the latest BIOS I have.

So I cannot test that the IRQ storm is gone with the newer BIOS :|

Still the GPIO pin misconfiguration likely was the issue of the
storm I was seeing on the X1C8 and also the T490s likely has the
same issue. I should actually receive a T490s loaner soon-ish,
because of some unrelated (Builtin privacy screen) work I'm doing.

Once I have received the T490s then I can test things on the T490s:
1. Revert the reverts so that in theory we get working TPM IRQ
    support in the kernel again
2. Check that the T490s does not like this
3. Write and test a blacklist patch

I'll ping my colleague who has arranged the loaner to ask what
the status is on it.

Regards,

Hans


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] tpm_tis_core: Disable broken IRQ handling code
  2020-06-16  8:24       ` Hans de Goede
@ 2020-06-17 22:56         ` Jarkko Sakkinen
  0 siblings, 0 replies; 22+ messages in thread
From: Jarkko Sakkinen @ 2020-06-17 22:56 UTC (permalink / raw)
  To: Hans de Goede
  Cc: Jerry Snitselaar, Peter Huewe, Jason Gunthorpe, Stefan Berger,
	Arnd Bergmann, Greg Kroah-Hartman, linux-integrity, Mark Pearson

On Tue, Jun 16, 2020 at 10:24:21AM +0200, Hans de Goede wrote:
> Hi,
> 
> On 6/4/20 11:17 AM, Jerry Snitselaar wrote:
> > On Thu May 07 20, Hans de Goede wrote:
> > > Hi All,
> > > 
> > > On 4/10/20 11:06 PM, Jarkko Sakkinen wrote:
> > > > On Thu, Apr 09, 2020 at 11:10:44PM +0200, Hans de Goede wrote:
> > > > > Since commit dda8b2af395b ("tpm: Revert "tpm_tis_core: Set
> > > > > TPM_CHIP_FLAG_IRQ before probing for interrupts"") we no longer set
> > > > > the TPM_CHIP_FLAG_IRQ ever.
> > > > > 
> > > > > So the whole IRQ probing code is not useful, worse we rely on the
> > > > > IRQ-test path of tpm_tis_send() to call disable_interrupts() if
> > > > > interrupts do not work, but that path never gets entered because we
> > > > > never set the TPM_CHIP_FLAG_IRQ.
> > > > > 
> > > > > So the remaining IRQ probe code calls request_irq() and never calls
> > > > > free_irq() even when the interrupt is not working.
> > > > > 
> > > > > On some systems, e.g. the Lenovo X1 8th gen,  the interrupt we try
> > > > > to use and never free creates an interrupt storm followed by
> > > > > an "irq XX: nobody cared" oops.
> > > > > 
> > > > > Since it is non-functional at the moment anyways, lets just completely
> > > > > disable the IRQ code in tpm_tis_core for now.
> > > > > 
> > > > > Fixes: dda8b2af395b ("tpm: Revert "tpm_tis_core: Set TPM_CHIP_FLAG_IRQ before probing for interrupts"")
> > > > > Signed-off-by: Hans de Goede <hdegoede@redhat.com>
> > > > > ---
> > > > > Note I'm working with Lenovo to try and get to the bottom of this.
> > > > > ---
> > > > 
> > > > OK if I recall correctly the reason for reverting was that the fixes
> > > > Stefan was sending were broken and no access to hardware were the
> > > > issues would be visible. The reason for not doing anything til this
> > > > day is that we don't have T490 available.
> > > 
> > > So as promised I have been in contact with Lenovo about this.
> > > 
> > > Specifically I have been in contact with Lenovo about seeing an
> > > IRQ storm when the tpm_tis code tries to use the IRQ on a X1 carbon
> > > 8th gen (X1C8), because of the now public plan that Lenovo will
> > > offer ordering this model with Fedora pre-installed:
> > > https://lwn.net/Articles/818595/
> > > 
> > > On the X1C8 the problem has been diagnosed to be a misconfigured
> > > GPIO pin on the CPU (the SoC). The X1C8 uses an SPI connected
> > > TPM chip with its IRQ connected to a GPIO on the SoC which is
> > > configured in Direct IRQ mode, so that it directly asserts
> > > IRQs on one of the APIC IRQs.  The problem is that due to the
> > > misconfiguration as soon as the IRQ is enabled it fires
> > > continuously.
> > > 
> > > For the X1C8 this should be fixed in the BIOS of the first
> > > batch which gets shipped out to customers so there we should
> > > not have to worry about this.
> > > 
> > > It is likely (but not yet confirmed) that the issue on the T490
> > > is the same, although on my test X1C8 device I got an IRQ storm,
> > > followed by the kernel disabling the IRQ, not a non booting system.
> > > I guess this might be due to kernel configuration differences.
> > > 
> > > Assuming that the issue on the T490 is the same, we might see a
> > > BIOS update fixing this, but given that non-booting is
> > > 'not good ("tm")' even if there will be a BIOS fix we should
> > > still do something at the kernel level to also work with the
> > > older unfixed BIOS which is already out there.
> > > 
> > > I've been thinking about this and I'm afraid that the only thing
> > > what we can do is add a DMI product-name (product-version for Lenovo)
> > > string based blacklist for IRQ usage to drivers/char/tpm/tpm_tis.c
> > > and set tpm_info.irq = -1 for devices on that list.
> > > 
> > > My plan is to prepare a RFC patch of such a blacklist, while we
> > > wait for confirmation that the root cause on the T490 is the same
> > > as on the X1C8, but before I work on that I'm wondering if
> > > people agree that that is the best approach, or if there are
> > > other suggestions for dealing with this ?
> > > 
> > > Regards,
> > > 
> > > Hans
> > > 
> > 
> > Jarrko,
> > 
> > Any thoughts about how we should move forward on dealing with this?
> > I've got a report about the original problem Stefan was dealing with,
> > where the tpm isn't powered on when it tries to send a command to
> > generate an interrupt. The tpm is functioning so it isn't urgent,
> > but it would be good to get this cleaned up so users aren't getting
> > transmit errors and firmware bug messages. Hans did you make any
> > progress on the blacklist patch?
> 
> Not really. I wanted to confirm on the X1C8 which I have on loan
> from Lenovo that it indeed was the GPIO pin misconfiguration which
> was the issue. But even though the Lenovo BIOS team claimed that they
> have fixed the root cause now, the workaround of not listing any
> IRQ for the TPM is also still in place in the latest BIOS I have.
> 
> So I cannot test that the IRQ storm is gone with the newer BIOS :|
> 
> Still the GPIO pin misconfiguration likely was the issue of the
> storm I was seeing on the X1C8 and also the T490s likely has the
> same issue. I should actually receive a T490s loaner soon-ish,
> because of some unrelated (Builtin privacy screen) work I'm doing.
> 
> Once I have received the T490s then I can test things on the T490s:
> 1. Revert the reverts so that in theory we get working TPM IRQ
>    support in the kernel again
> 2. Check that the T490s does not like this
> 3. Write and test a blacklist patch
> 
> I'll ping my colleague who has arranged the loaner to ask what
> the status is on it.

Thanks alot. Does not make sense to rush with this now that the
tree is not completely broken.

/Jarkko

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2020-06-17 22:56 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-09 21:10 [PATCH] tpm_tis_core: Disable broken IRQ handling code Hans de Goede
2020-04-10 16:38 ` Jason Gunthorpe
2020-04-10 21:08   ` Jarkko Sakkinen
2020-04-10 21:12   ` Jarkko Sakkinen
2020-04-14 17:44   ` Jerry Snitselaar
2020-04-15  8:57     ` Jan Lübbe
2020-04-16 17:03     ` Jarkko Sakkinen
2020-04-10 21:06 ` Jarkko Sakkinen
2020-04-10 21:09   ` Jarkko Sakkinen
2020-04-12 14:55     ` Jarkko Sakkinen
2020-05-07 14:02   ` Hans de Goede
2020-05-07 14:02   ` Hans de Goede
2020-05-07 14:17     ` Jerry Snitselaar
2020-05-07 17:38       ` Hans de Goede
2020-05-07 21:51       ` Dan Williams
2020-05-07 23:13         ` Jerry Snitselaar
2020-05-08  7:45         ` Hans de Goede
2020-05-13 21:31         ` Jarkko Sakkinen
2020-06-04  9:17     ` Jerry Snitselaar
2020-06-15 22:06       ` Jarkko Sakkinen
2020-06-16  8:24       ` Hans de Goede
2020-06-17 22:56         ` Jarkko Sakkinen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).