All of lore.kernel.org
 help / color / mirror / Atom feed
From: Hans de Goede <hdegoede@redhat.com>
To: Jarkko Sakkinen <jarkko@kernel.org>
Cc: Jerry Snitselaar <jsnitsel@redhat.com>,
	Matthew Garrett <mjg59@google.com>,
	linux-integrity <linux-integrity@vger.kernel.org>,
	James Bottomley <James.Bottomley@hansenpartnership.com>
Subject: Re: Recent tpm_tis IRQ handling changes are causing kernel backtraces]
Date: Sat, 17 Jul 2021 18:10:04 +0200	[thread overview]
Message-ID: <fa61681f-b07f-9d31-b193-9d42cd23167a@redhat.com> (raw)
In-Reply-To: <20210709184426.7yftmufrmw3tp24s@kernel.org>

Hi,

On 7/9/21 8:44 PM, Jarkko Sakkinen wrote:
> On Wed, Jun 30, 2021 at 03:36:55PM +0200, Hans de Goede wrote:
>> Hi,
>>
>> On 6/30/21 2:47 PM, Hans de Goede wrote:
>>> Hi,
>>>
>>> On 6/30/21 12:05 AM, Jarkko Sakkinen wrote:
>>>> On Tue, Jun 29, 2021 at 09:14:39PM +0200, Hans de Goede wrote:
>>>>> Hi,
>>>>>
>>>>> On 6/29/21 8:04 PM, Jarkko Sakkinen wrote:
>>>>>> On Wed, Jun 23, 2021 at 03:54:59PM +0200, Hans de Goede wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> On 6/23/21 3:40 PM, Jarkko Sakkinen wrote:
>>>>>>>> On Mon, Jun 21, 2021 at 02:04:52PM +0200, Hans de Goede wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> On 6/14/21 3:33 PM, Hans de Goede wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> On 6/1/21 6:04 PM, Hans de Goede wrote:
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> On 5/31/21 6:36 AM, Jarkko Sakkinen wrote:
>>>>>>>>>>>>> Interestingly enough the first backtrace is also happening on a:
>>>>>>>>>>>>> "Dell Inc. XPS 13 9310/0MRT12, BIOS 2.2.0 04/06/2021"
>>>>>>>>>>>>>
>>>>>>>>>>>>> So it seems that at least with 5.12.6 (which has the last 2 fixes)
>>>>>>>>>>>>> all reports are about the XPS 13 9310. I wonder if there is an
>>>>>>>>>>>>> issue with the TPM interrupt line on the XPS 13 9310; I've asked the
>>>>>>>>>>>>> reporters to try adding tpm_tis.interrupts=0 to their kernel commandline.
>>>>>>>>>>>>
>>>>>>>>>>>> This is helpful for sure that these all are happening on matching hardware.
>>>>>>>>>>>
>>>>>>>>>>> So our kernel-backtrace tracking info (ABRT) just recorded a third backtrace
>>>>>>>>>>> with a kernel >= 5.12.6, again on the XPS 13 9310, so now we have 3 variants:
>>>>>>>>>>>
>>>>>>>>>>> 1. Backtrace starting with a call to ima_add_boot_aggregate
>>>>>>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1963712
>>>>>>>>>>>
>>>>>>>>>>> 2. Backtrace starting with a call to tpm_dev_async_work:
>>>>>>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1964974
>>>>>>>>>>> (note this one is not easily reproducible)
>>>>>>>>>>>
>>>>>>>>>>> 3. Backtrace starting with a call to rng_dev_read:
>>>>>>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1920510
>>>>>>>>>>>
>>>>>>>>>>> 3. is the new one. All bugs linked above are public, all 3 backtraces
>>>>>>>>>>> so far have only been reported on the XPS 13 9310 (with kernel >= 5.12.6)
>>>>>>>>>>> and I've asked all the reporters to check if tpm_tis.interrupts=0 helps.
>>>>>>>>>>
>>>>>>>>>> Quick status update, I've got a response from a XPS 13 9310 user in:
>>>>>>>>>>
>>>>>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1920510
>>>>>>>>>>
>>>>>>>>>> Indicating that a. he can reproduce this with the latest >= 5.12.6 kernels;
>>>>>>>>>> and b. it goes away when specifying tpm_tis.interrupts=0 as I expected
>>>>>>>>>> (I expected this because all the bug-reports started when the interrupt
>>>>>>>>>> code got fixed/re-enabled a while ago).
>>>>>>>>>
>>>>>>>>> One more status update.
>>>>>>>>>
>>>>>>>>> - A new 4th variant of the backtrace has been spotted, where the problem hits
>>>>>>>>> when called from probe() -> tpm2_auto_startup -> tpm2_do_selftest, see:
>>>>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1958381
>>>>>>>>>
>>>>>>>>> - So far all reports with kernel >= 5.12.6 have been on a Dell XPS 13 9310
>>>>>>>>> models. But the new variant is happening on a Dell XPS 15 9500 and the
>>>>>>>>> backtrace starting at ima_add_boot_aggregate is also being reported on
>>>>>>>>> a Dell XPS 15 9500 (as well as on the XPS 13 9310).
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>>
>>>>>>>>> Hans
>>>>>>>>
>>>>>>>> OK, I'll have to query if I could borrow that laptop from someone. It's
>>>>>>>> fairly common laptop, i.e. might be possible.
>>>>>>>
>>>>>>> In the mean time I've also got a report that this variant of the backtrace:
>>>>>>>
>>>>>>> 1. Backtrace starting with a call to ima_add_boot_aggregate
>>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1963712
>>>>>>>
>>>>>>> Is also still happening with recent 5.12.y kernels on
>>>>>>> Dell Precision 7750 laptops. Both the Precision 7750 and the XPS 9500 use
>>>>>>> 10th gen comet lake processors (i7-10750H), where as the XPS 9310 is using
>>>>>>> an icelake processor. So the common denominator seems to be that they are
>>>>>>> all 2020 Dell laptop models using the latest Intel CPUs.
>>>>>>>
>>>>>>> FYI the complete list of models on which some of the 4 backtrace variants
>>>>>>> are still seen on recent 5.12.y kernels is now:
>>>>>>>
>>>>>>> Dell XPS 13 9310
>>>>>>> Dell XPS 15 9500
>>>>>>> Dell Precision 7750
>>>>>>>
>>>>>>> Regards,
>>>>>>>
>>>>>>> Hans
>>>>>>
>>>>>> Does "tpm_tis.interrupts=0" uniformly workaround the issue?
>>>>>
>>>>> I unfortunately have not gotten much replies to my request to test with
>>>>> tpm_tis.interrupts=0, but for those people who have bothered to test
>>>>> (2 reporters IIRC) using tpm_tis.interrupts=0 does avoid the issue.
>>>>
>>>> So we see this in dmesg as first anything from TPM:
>>>>
>>>> [    0.904572] tpm_tis STM0125:00: 2.0 TPM (device-id 0x0, rev-id 78)
>>>>
>>>> This means that one command is successfully processed by the TPM, i.e.
>>>> tpm2_probe() in tpm_tis_core_init().
>>>>
>>>> My first *guess*  was that IRQ is given by ACPI, would need ACPI dump to
>>>> confirm (e.g. sudo acpidump > acpi.dump). It cannot be so because otherwise
>>>> this code path would be executed:
>>>>
>>>>         if (!(chip->flags & TPM_CHIP_FLAG_IRQ)) {
>>>>                 dev_err(&chip->dev, FW_BUG
>>>>                         "TPM interrupt not working, polling instead\n");
>>>>
>>>>                 disable_interrupts(chip);
>>>>         }
>>>>
>>>> TPM_CHIP_FLAG_IRQ is never set, so you should see this message in dmesg if
>>>> a legit value is given to IRQ by ACPI.  We are probably planning re-enable
>>>> IRQ code after these type of issues are fully resolved, but right now you
>>>> should not end up having it enabled (see tpm_tis_send() function).
>>>>
>>>> To put this together "if (irq != -1) {" path in tpm_tis_core_init() is
>>>> never executed. And early in the same function the interrupt hardware is
>>>> *explicitly* disabled.
>>>>
>>>> For me this looks like a hardware bug right now: interrupts stay enabled
>>>> for some reason.
>>>>
>>>> ACPI dump would be useful to verify some of the assumptions in this.
>>>
>>> Ok, I've added a comment to the Fedora bugs for the 4 different backtrace
>>> variants asking for acpidumps for the Dell XPS 13 9310, Dell XPS 15 9500
>>> and Dell Precision 7750 laptops.
>>
>> 2 XPS 9310 acpidumps have been attached to:
>>
>> https://bugzilla.redhat.com/show_bug.cgi?id=1920510
>> https://bugzilla.redhat.com/show_bug.cgi?id=1964974
>>
>> Note the reporter of the first bug mentions that he is no longer having
>> this issue, but we are definitely still getting reports for kernel version
>>> 5.12.6 (which has the last 2 fixes) from XPS 9310 users...
>>
>> Maybe there are different BIOS versions in play ? It might be interesting
>> to compare the 2 acpidumps...
> 
> ❯ diff -u ../tmp/ssdt7.dsl ../tmp2/ssdt7.dsl
> --- ../tmp/ssdt7.dsl	2021-07-09 21:32:06.473166420 +0300
> +++ ../tmp2/ssdt7.dsl	2021-07-09 21:33:45.065934469 +0300
> @@ -5,7 +5,7 @@
>   * 
>   * Disassembling to symbolic ASL+ operators
>   *
> - * Disassembly of ssdt7.dat, Fri Jul  9 21:32:06 2021
> + * Disassembly of ssdt7.dat, Fri Jul  9 21:33:45 2021
>   *
>   * Original Table Header:
>   *     Signature        "SSDT"
> @@ -121,7 +121,7 @@
>                      0xFED40000,         // Address Base
>                      0x00005000,         // Address Length
>                      )
> -                Interrupt (ResourceConsumer, Level, ActiveLow, Shared, ,, _Y58)
> +                Interrupt (ResourceConsumer, Level, ActiveLow, Shared, ,, _Y55)
>                  {
>                      0x0000000C,
>                  }
> @@ -141,7 +141,7 @@
>                  }
>                  Else
>                  {
> -                    CreateDWordField (RES0, \_SB.TPM._Y58._INT, LIRQ)  // _INT: Interrupts
> +                    CreateDWordField (RES0, \_SB.TPM._Y55._INT, LIRQ)  // _INT: Interrupts
>                      LIRQ = IRQN /* \_SB_.TPM_.IRQN */
>                      Return (RES0) /* \_SB_.TPM_.RES0 */
>                  }
> @@ -152,14 +152,14 @@
>                  If ((IRQN != Zero))
>                  {
>                      CreateDWordField (Arg0, 0x11, IRQ0)
> -                    CreateDWordField (RES0, \_SB.TPM._Y58._INT, LIRQ)  // _INT: Interrupts
> +                    CreateDWordField (RES0, \_SB.TPM._Y55._INT, LIRQ)  // _INT: Interrupts
>                      LIRQ = IRQ0 /* \_SB_.TPM_._SRS.IRQ0 */
>                      IRQN = IRQ0 /* \_SB_.TPM_._SRS.IRQ0 */
>                      CreateBitField (Arg0, 0x79, ITRG)
> -                    CreateBitField (RES0, \_SB.TPM._Y58._HE, LTRG)  // _HE_: High-Edge
> +                    CreateBitField (RES0, \_SB.TPM._Y55._HE, LTRG)  // _HE_: High-Edge
>                      LTRG = ITRG /* \_SB_.TPM_._SRS.ITRG */
>                      CreateBitField (Arg0, 0x7A, ILVL)
> -                    CreateBitField (RES0, \_SB.TPM._Y58._LL, LLVL)  // _LL_: Low Level
> +                    CreateBitField (RES0, \_SB.TPM._Y55._LL, LLVL)  // _LL_: Low Level
>                      LLVL = ILVL /* \_SB_.TPM_._SRS.ILVL */
>                      If ((((TID0 & 0x0F) == Zero) || ((TID0 & 0x0F
>                          ) == 0x0F)))
> 
> This delta from "acpidump for Dell XPS 9310 (with Qualcomm QCA6390)" to
> "acpidump output from a Dell XPS 13 9310 that no longer has a problem"
> in SSDT7. The bug I'm referring to is
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=1920510
> 
> Looks to me just using a different label.

Yes, although I have the feeling that does indicate that their are possibly
other changes under the hood. The 0x00000000c interrupt referred to seems to
be an interrupt directly on the APIC, which means that it is a GPIO in direct-irq
mode with level vs edge interrupt mode selection (pullup/down settings is all
directly done by the BIOS.

I'm aware that misconfiguring those settings (which Linux cannot see) was
an issue with the TPM IRQ on some Lenovo models, maybe the same is going on
here; and later BIOS versions contain a fix (and this somehow also has
changed the label in the DSDT ?).

> What if we just set "interrupts=0" explicitly for STM0125 HID since the
> workaround seems to work according to the report?

That sounds like a reasonable workaround.

Regards,

Hans


  reply	other threads:[~2021-07-17 16:10 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-11 13:09 Recent tpm_tis IRQ handling changes are causing kernel backtraces Hans de Goede
2021-03-16 15:34 ` Hans de Goede
2021-03-16 19:18   ` Jarkko Sakkinen
2021-05-08  9:07     ` Hans de Goede
2021-05-10 17:25       ` Jarkko Sakkinen
2021-05-11  8:37         ` Hans de Goede
2021-05-11 23:48           ` Jarkko Sakkinen
2021-05-26 19:03           ` Hans de Goede
2021-05-27 14:00             ` Jarkko Sakkinen
2021-05-27 15:27               ` Hans de Goede
     [not found]                 ` <20210531043616.u3v25qzkkrik5apq@kernel.org>
2021-05-31  8:24                   ` Hans de Goede
2021-06-01 18:02                     ` Jarkko Sakkinen
2021-06-01 16:04                   ` Hans de Goede
2021-06-01 18:03                     ` Jarkko Sakkinen
2021-06-14 13:33                     ` Hans de Goede
2021-06-15 13:01                       ` Jarkko Sakkinen
2021-06-15 13:59                         ` Hans de Goede
2021-06-23 13:37                           ` Jarkko Sakkinen
2021-06-21 12:04                       ` Hans de Goede
2021-06-23 13:40                         ` Jarkko Sakkinen
2021-06-23 13:54                           ` Hans de Goede
2021-06-29 18:04                             ` Recent tpm_tis IRQ handling changes are causing kernel backtraces] Jarkko Sakkinen
2021-06-29 19:14                               ` Hans de Goede
2021-06-29 22:05                                 ` Jarkko Sakkinen
2021-06-30 12:47                                   ` Hans de Goede
2021-06-30 13:36                                     ` Hans de Goede
2021-07-09 18:44                                       ` Jarkko Sakkinen
2021-07-17 16:10                                         ` Hans de Goede [this message]
2021-07-27  2:50                                           ` Jarkko Sakkinen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fa61681f-b07f-9d31-b193-9d42cd23167a@redhat.com \
    --to=hdegoede@redhat.com \
    --cc=James.Bottomley@hansenpartnership.com \
    --cc=jarkko@kernel.org \
    --cc=jsnitsel@redhat.com \
    --cc=linux-integrity@vger.kernel.org \
    --cc=mjg59@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.