All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jarkko Sakkinen <jarkko@kernel.org>
To: Hans de Goede <hdegoede@redhat.com>
Cc: Jerry Snitselaar <jsnitsel@redhat.com>,
	Matthew Garrett <mjg59@google.com>,
	linux-integrity <linux-integrity@vger.kernel.org>,
	James Bottomley <James.Bottomley@hansenpartnership.com>
Subject: Re: Recent tpm_tis IRQ handling changes are causing kernel backtraces]
Date: Fri, 9 Jul 2021 21:44:26 +0300	[thread overview]
Message-ID: <20210709184426.7yftmufrmw3tp24s@kernel.org> (raw)
In-Reply-To: <8819200d-e468-7b09-2e08-e44f5f2ed59f@redhat.com>

On Wed, Jun 30, 2021 at 03:36:55PM +0200, Hans de Goede wrote:
> Hi,
> 
> On 6/30/21 2:47 PM, Hans de Goede wrote:
> > Hi,
> > 
> > On 6/30/21 12:05 AM, Jarkko Sakkinen wrote:
> >> On Tue, Jun 29, 2021 at 09:14:39PM +0200, Hans de Goede wrote:
> >>> Hi,
> >>>
> >>> On 6/29/21 8:04 PM, Jarkko Sakkinen wrote:
> >>>> On Wed, Jun 23, 2021 at 03:54:59PM +0200, Hans de Goede wrote:
> >>>>> Hi,
> >>>>>
> >>>>> On 6/23/21 3:40 PM, Jarkko Sakkinen wrote:
> >>>>>> On Mon, Jun 21, 2021 at 02:04:52PM +0200, Hans de Goede wrote:
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> On 6/14/21 3:33 PM, Hans de Goede wrote:
> >>>>>>>> Hi,
> >>>>>>>>
> >>>>>>>> On 6/1/21 6:04 PM, Hans de Goede wrote:
> >>>>>>>>> Hi,
> >>>>>>>>>
> >>>>>>>>> On 5/31/21 6:36 AM, Jarkko Sakkinen wrote:
> >>>>>>>>>>> Interestingly enough the first backtrace is also happening on a:
> >>>>>>>>>>> "Dell Inc. XPS 13 9310/0MRT12, BIOS 2.2.0 04/06/2021"
> >>>>>>>>>>>
> >>>>>>>>>>> So it seems that at least with 5.12.6 (which has the last 2 fixes)
> >>>>>>>>>>> all reports are about the XPS 13 9310. I wonder if there is an
> >>>>>>>>>>> issue with the TPM interrupt line on the XPS 13 9310; I've asked the
> >>>>>>>>>>> reporters to try adding tpm_tis.interrupts=0 to their kernel commandline.
> >>>>>>>>>>
> >>>>>>>>>> This is helpful for sure that these all are happening on matching hardware.
> >>>>>>>>>
> >>>>>>>>> So our kernel-backtrace tracking info (ABRT) just recorded a third backtrace
> >>>>>>>>> with a kernel >= 5.12.6, again on the XPS 13 9310, so now we have 3 variants:
> >>>>>>>>>
> >>>>>>>>> 1. Backtrace starting with a call to ima_add_boot_aggregate
> >>>>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1963712
> >>>>>>>>>
> >>>>>>>>> 2. Backtrace starting with a call to tpm_dev_async_work:
> >>>>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1964974
> >>>>>>>>> (note this one is not easily reproducible)
> >>>>>>>>>
> >>>>>>>>> 3. Backtrace starting with a call to rng_dev_read:
> >>>>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1920510
> >>>>>>>>>
> >>>>>>>>> 3. is the new one. All bugs linked above are public, all 3 backtraces
> >>>>>>>>> so far have only been reported on the XPS 13 9310 (with kernel >= 5.12.6)
> >>>>>>>>> and I've asked all the reporters to check if tpm_tis.interrupts=0 helps.
> >>>>>>>>
> >>>>>>>> Quick status update, I've got a response from a XPS 13 9310 user in:
> >>>>>>>>
> >>>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1920510
> >>>>>>>>
> >>>>>>>> Indicating that a. he can reproduce this with the latest >= 5.12.6 kernels;
> >>>>>>>> and b. it goes away when specifying tpm_tis.interrupts=0 as I expected
> >>>>>>>> (I expected this because all the bug-reports started when the interrupt
> >>>>>>>> code got fixed/re-enabled a while ago).
> >>>>>>>
> >>>>>>> One more status update.
> >>>>>>>
> >>>>>>> - A new 4th variant of the backtrace has been spotted, where the problem hits
> >>>>>>> when called from probe() -> tpm2_auto_startup -> tpm2_do_selftest, see:
> >>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1958381
> >>>>>>>
> >>>>>>> - So far all reports with kernel >= 5.12.6 have been on a Dell XPS 13 9310
> >>>>>>> models. But the new variant is happening on a Dell XPS 15 9500 and the
> >>>>>>> backtrace starting at ima_add_boot_aggregate is also being reported on
> >>>>>>> a Dell XPS 15 9500 (as well as on the XPS 13 9310).
> >>>>>>>
> >>>>>>> Regards,
> >>>>>>>
> >>>>>>> Hans
> >>>>>>
> >>>>>> OK, I'll have to query if I could borrow that laptop from someone. It's
> >>>>>> fairly common laptop, i.e. might be possible.
> >>>>>
> >>>>> In the mean time I've also got a report that this variant of the backtrace:
> >>>>>
> >>>>> 1. Backtrace starting with a call to ima_add_boot_aggregate
> >>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1963712
> >>>>>
> >>>>> Is also still happening with recent 5.12.y kernels on
> >>>>> Dell Precision 7750 laptops. Both the Precision 7750 and the XPS 9500 use
> >>>>> 10th gen comet lake processors (i7-10750H), where as the XPS 9310 is using
> >>>>> an icelake processor. So the common denominator seems to be that they are
> >>>>> all 2020 Dell laptop models using the latest Intel CPUs.
> >>>>>
> >>>>> FYI the complete list of models on which some of the 4 backtrace variants
> >>>>> are still seen on recent 5.12.y kernels is now:
> >>>>>
> >>>>> Dell XPS 13 9310
> >>>>> Dell XPS 15 9500
> >>>>> Dell Precision 7750
> >>>>>
> >>>>> Regards,
> >>>>>
> >>>>> Hans
> >>>>
> >>>> Does "tpm_tis.interrupts=0" uniformly workaround the issue?
> >>>
> >>> I unfortunately have not gotten much replies to my request to test with
> >>> tpm_tis.interrupts=0, but for those people who have bothered to test
> >>> (2 reporters IIRC) using tpm_tis.interrupts=0 does avoid the issue.
> >>
> >> So we see this in dmesg as first anything from TPM:
> >>
> >> [    0.904572] tpm_tis STM0125:00: 2.0 TPM (device-id 0x0, rev-id 78)
> >>
> >> This means that one command is successfully processed by the TPM, i.e.
> >> tpm2_probe() in tpm_tis_core_init().
> >>
> >> My first *guess*  was that IRQ is given by ACPI, would need ACPI dump to
> >> confirm (e.g. sudo acpidump > acpi.dump). It cannot be so because otherwise
> >> this code path would be executed:
> >>
> >>         if (!(chip->flags & TPM_CHIP_FLAG_IRQ)) {
> >>                 dev_err(&chip->dev, FW_BUG
> >>                         "TPM interrupt not working, polling instead\n");
> >>
> >>                 disable_interrupts(chip);
> >>         }
> >>
> >> TPM_CHIP_FLAG_IRQ is never set, so you should see this message in dmesg if
> >> a legit value is given to IRQ by ACPI.  We are probably planning re-enable
> >> IRQ code after these type of issues are fully resolved, but right now you
> >> should not end up having it enabled (see tpm_tis_send() function).
> >>
> >> To put this together "if (irq != -1) {" path in tpm_tis_core_init() is
> >> never executed. And early in the same function the interrupt hardware is
> >> *explicitly* disabled.
> >>
> >> For me this looks like a hardware bug right now: interrupts stay enabled
> >> for some reason.
> >>
> >> ACPI dump would be useful to verify some of the assumptions in this.
> > 
> > Ok, I've added a comment to the Fedora bugs for the 4 different backtrace
> > variants asking for acpidumps for the Dell XPS 13 9310, Dell XPS 15 9500
> > and Dell Precision 7750 laptops.
> 
> 2 XPS 9310 acpidumps have been attached to:
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=1920510
> https://bugzilla.redhat.com/show_bug.cgi?id=1964974
> 
> Note the reporter of the first bug mentions that he is no longer having
> this issue, but we are definitely still getting reports for kernel version
> > 5.12.6 (which has the last 2 fixes) from XPS 9310 users...
> 
> Maybe there are different BIOS versions in play ? It might be interesting
> to compare the 2 acpidumps...

❯ diff -u ../tmp/ssdt7.dsl ../tmp2/ssdt7.dsl
--- ../tmp/ssdt7.dsl	2021-07-09 21:32:06.473166420 +0300
+++ ../tmp2/ssdt7.dsl	2021-07-09 21:33:45.065934469 +0300
@@ -5,7 +5,7 @@
  * 
  * Disassembling to symbolic ASL+ operators
  *
- * Disassembly of ssdt7.dat, Fri Jul  9 21:32:06 2021
+ * Disassembly of ssdt7.dat, Fri Jul  9 21:33:45 2021
  *
  * Original Table Header:
  *     Signature        "SSDT"
@@ -121,7 +121,7 @@
                     0xFED40000,         // Address Base
                     0x00005000,         // Address Length
                     )
-                Interrupt (ResourceConsumer, Level, ActiveLow, Shared, ,, _Y58)
+                Interrupt (ResourceConsumer, Level, ActiveLow, Shared, ,, _Y55)
                 {
                     0x0000000C,
                 }
@@ -141,7 +141,7 @@
                 }
                 Else
                 {
-                    CreateDWordField (RES0, \_SB.TPM._Y58._INT, LIRQ)  // _INT: Interrupts
+                    CreateDWordField (RES0, \_SB.TPM._Y55._INT, LIRQ)  // _INT: Interrupts
                     LIRQ = IRQN /* \_SB_.TPM_.IRQN */
                     Return (RES0) /* \_SB_.TPM_.RES0 */
                 }
@@ -152,14 +152,14 @@
                 If ((IRQN != Zero))
                 {
                     CreateDWordField (Arg0, 0x11, IRQ0)
-                    CreateDWordField (RES0, \_SB.TPM._Y58._INT, LIRQ)  // _INT: Interrupts
+                    CreateDWordField (RES0, \_SB.TPM._Y55._INT, LIRQ)  // _INT: Interrupts
                     LIRQ = IRQ0 /* \_SB_.TPM_._SRS.IRQ0 */
                     IRQN = IRQ0 /* \_SB_.TPM_._SRS.IRQ0 */
                     CreateBitField (Arg0, 0x79, ITRG)
-                    CreateBitField (RES0, \_SB.TPM._Y58._HE, LTRG)  // _HE_: High-Edge
+                    CreateBitField (RES0, \_SB.TPM._Y55._HE, LTRG)  // _HE_: High-Edge
                     LTRG = ITRG /* \_SB_.TPM_._SRS.ITRG */
                     CreateBitField (Arg0, 0x7A, ILVL)
-                    CreateBitField (RES0, \_SB.TPM._Y58._LL, LLVL)  // _LL_: Low Level
+                    CreateBitField (RES0, \_SB.TPM._Y55._LL, LLVL)  // _LL_: Low Level
                     LLVL = ILVL /* \_SB_.TPM_._SRS.ILVL */
                     If ((((TID0 & 0x0F) == Zero) || ((TID0 & 0x0F
                         ) == 0x0F)))

This delta from "acpidump for Dell XPS 9310 (with Qualcomm QCA6390)" to
"acpidump output from a Dell XPS 13 9310 that no longer has a problem"
in SSDT7. The bug I'm referring to is

https://bugzilla.redhat.com/show_bug.cgi?id=1920510

Looks to me just using a different label.

What if we just set "interrupts=0" explicitly for STM0125 HID since the
workaround seems to work according to the report?

/Jarkko

  reply	other threads:[~2021-07-09 18:44 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-11 13:09 Recent tpm_tis IRQ handling changes are causing kernel backtraces Hans de Goede
2021-03-16 15:34 ` Hans de Goede
2021-03-16 19:18   ` Jarkko Sakkinen
2021-05-08  9:07     ` Hans de Goede
2021-05-10 17:25       ` Jarkko Sakkinen
2021-05-11  8:37         ` Hans de Goede
2021-05-11 23:48           ` Jarkko Sakkinen
2021-05-26 19:03           ` Hans de Goede
2021-05-27 14:00             ` Jarkko Sakkinen
2021-05-27 15:27               ` Hans de Goede
     [not found]                 ` <20210531043616.u3v25qzkkrik5apq@kernel.org>
2021-05-31  8:24                   ` Hans de Goede
2021-06-01 18:02                     ` Jarkko Sakkinen
2021-06-01 16:04                   ` Hans de Goede
2021-06-01 18:03                     ` Jarkko Sakkinen
2021-06-14 13:33                     ` Hans de Goede
2021-06-15 13:01                       ` Jarkko Sakkinen
2021-06-15 13:59                         ` Hans de Goede
2021-06-23 13:37                           ` Jarkko Sakkinen
2021-06-21 12:04                       ` Hans de Goede
2021-06-23 13:40                         ` Jarkko Sakkinen
2021-06-23 13:54                           ` Hans de Goede
2021-06-29 18:04                             ` Recent tpm_tis IRQ handling changes are causing kernel backtraces] Jarkko Sakkinen
2021-06-29 19:14                               ` Hans de Goede
2021-06-29 22:05                                 ` Jarkko Sakkinen
2021-06-30 12:47                                   ` Hans de Goede
2021-06-30 13:36                                     ` Hans de Goede
2021-07-09 18:44                                       ` Jarkko Sakkinen [this message]
2021-07-17 16:10                                         ` Hans de Goede
2021-07-27  2:50                                           ` Jarkko Sakkinen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210709184426.7yftmufrmw3tp24s@kernel.org \
    --to=jarkko@kernel.org \
    --cc=James.Bottomley@hansenpartnership.com \
    --cc=hdegoede@redhat.com \
    --cc=jsnitsel@redhat.com \
    --cc=linux-integrity@vger.kernel.org \
    --cc=mjg59@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.