linux-integrity.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* 5.3 boot regression caused by 5.3 TPM changes
@ 2019-08-04 10:00 Hans de Goede
  2019-08-04 15:33 ` Ard Biesheuvel
       [not found] ` <0d5bbfe6-a95e-987e-b436-83f754d044ac@canonical.com>
  0 siblings, 2 replies; 9+ messages in thread
From: Hans de Goede @ 2019-08-04 10:00 UTC (permalink / raw)
  To: Matthew Garrett, Ard Biesheuvel, Jarkko Sakkinen, Peter Huewe
  Cc: Greg Kroah-Hartman, Thomas Gleixner, Ingo Molnar,
	linux-integrity, Linux Kernel Mailing List, linux-efi

Hi All,

While testing 5.3-rc2 on an Irbis TW90 Intel Cherry Trail based
tablet I noticed that it does not boot on this device.

A git bisect points to commit 166a2809d65b ("tpm: Don't duplicate
events from the final event log in the TCG2 log")

And I can confirm that reverting just that single commit makes
the TW90 boot again.

This machine uses AptIO firmware with base component versions
of: UEFI 2.4 PI 1.3. I've tried to reproduce the problem on
a Teclast X80 Pro which is also CHT based and also uses AptIO
firmware with the same base components. But it does not reproduce
there. Neither does the problem reproduce on a CHT tablet using
InsideH20 based firmware.

Note that these devices have a software/firmware TPM-2.0
implementation, they do not have an actual TPM chip.

Comparing TPM firmware setting between the 2 AptIO based
tablets the settings are identical, but the troublesome
TW90 does have some more setting then the X80, it has
the following settings which are not shown on the X80:

Active PCR banks:           SHA-1         (read only)
Available PCR banks:        SHA-1,SHA256  (read only)
TPM2.0 UEFI SPEC version:   TCG_2         (other possible setting: TCG_1_2
Physical Presence SPEC ver: 1.2           (other possible setting: 1.3)

I have the feeling that at least the first 2 indicate that
the previous win10 installation has actually used the
TPM, where as on the X80 the TPM is uninitialized.
Note this is just a hunch I could be completely wrong.

I would be happy to run any commands to try and debug this
or to build a kernel with some patches to gather more info.

Note any kernel patches to printk some debug stuff need
to be based on 5.3 with 166a2809d65b reverted, without that
reverted the device will not boot, and thus I cannot collect
logs without it reverted.

Regards,

Hans

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 5.3 boot regression caused by 5.3 TPM changes
  2019-08-04 10:00 5.3 boot regression caused by 5.3 TPM changes Hans de Goede
@ 2019-08-04 15:33 ` Ard Biesheuvel
  2019-08-04 16:12   ` Hans de Goede
       [not found] ` <0d5bbfe6-a95e-987e-b436-83f754d044ac@canonical.com>
  1 sibling, 1 reply; 9+ messages in thread
From: Ard Biesheuvel @ 2019-08-04 15:33 UTC (permalink / raw)
  To: Hans de Goede
  Cc: Matthew Garrett, Jarkko Sakkinen, Peter Huewe,
	Greg Kroah-Hartman, Thomas Gleixner, Ingo Molnar,
	linux-integrity, Linux Kernel Mailing List, linux-efi

Hi Hans,

On Sun, 4 Aug 2019 at 13:00, Hans de Goede <hdegoede@redhat.com> wrote:
>
> Hi All,
>
> While testing 5.3-rc2 on an Irbis TW90 Intel Cherry Trail based
> tablet I noticed that it does not boot on this device.
>
> A git bisect points to commit 166a2809d65b ("tpm: Don't duplicate
> events from the final event log in the TCG2 log")
>
> And I can confirm that reverting just that single commit makes
> the TW90 boot again.
>
> This machine uses AptIO firmware with base component versions
> of: UEFI 2.4 PI 1.3. I've tried to reproduce the problem on
> a Teclast X80 Pro which is also CHT based and also uses AptIO
> firmware with the same base components. But it does not reproduce
> there. Neither does the problem reproduce on a CHT tablet using
> InsideH20 based firmware.
>
> Note that these devices have a software/firmware TPM-2.0
> implementation, they do not have an actual TPM chip.
>
> Comparing TPM firmware setting between the 2 AptIO based
> tablets the settings are identical, but the troublesome
> TW90 does have some more setting then the X80, it has
> the following settings which are not shown on the X80:
>
> Active PCR banks:           SHA-1         (read only)
> Available PCR banks:        SHA-1,SHA256  (read only)
> TPM2.0 UEFI SPEC version:   TCG_2         (other possible setting: TCG_1_2
> Physical Presence SPEC ver: 1.2           (other possible setting: 1.3)
>
> I have the feeling that at least the first 2 indicate that
> the previous win10 installation has actually used the
> TPM, where as on the X80 the TPM is uninitialized.
> Note this is just a hunch I could be completely wrong.
>
> I would be happy to run any commands to try and debug this
> or to build a kernel with some patches to gather more info.
>
> Note any kernel patches to printk some debug stuff need
> to be based on 5.3 with 166a2809d65b reverted, without that
> reverted the device will not boot, and thus I cannot collect
> logs without it reverted.
>

Are you booting a 64-bit kernel on 32-bit firmware?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 5.3 boot regression caused by 5.3 TPM changes
  2019-08-04 15:33 ` Ard Biesheuvel
@ 2019-08-04 16:12   ` Hans de Goede
  2019-08-05 16:01     ` Ard Biesheuvel
  0 siblings, 1 reply; 9+ messages in thread
From: Hans de Goede @ 2019-08-04 16:12 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Matthew Garrett, Jarkko Sakkinen, Peter Huewe,
	Greg Kroah-Hartman, Thomas Gleixner, Ingo Molnar,
	linux-integrity, Linux Kernel Mailing List, linux-efi

Hi,

On 04-08-19 17:33, Ard Biesheuvel wrote:
> Hi Hans,
> 
> On Sun, 4 Aug 2019 at 13:00, Hans de Goede <hdegoede@redhat.com> wrote:
>>
>> Hi All,
>>
>> While testing 5.3-rc2 on an Irbis TW90 Intel Cherry Trail based
>> tablet I noticed that it does not boot on this device.
>>
>> A git bisect points to commit 166a2809d65b ("tpm: Don't duplicate
>> events from the final event log in the TCG2 log")
>>
>> And I can confirm that reverting just that single commit makes
>> the TW90 boot again.
>>
>> This machine uses AptIO firmware with base component versions
>> of: UEFI 2.4 PI 1.3. I've tried to reproduce the problem on
>> a Teclast X80 Pro which is also CHT based and also uses AptIO
>> firmware with the same base components. But it does not reproduce
>> there. Neither does the problem reproduce on a CHT tablet using
>> InsideH20 based firmware.
>>
>> Note that these devices have a software/firmware TPM-2.0
>> implementation, they do not have an actual TPM chip.
>>
>> Comparing TPM firmware setting between the 2 AptIO based
>> tablets the settings are identical, but the troublesome
>> TW90 does have some more setting then the X80, it has
>> the following settings which are not shown on the X80:
>>
>> Active PCR banks:           SHA-1         (read only)
>> Available PCR banks:        SHA-1,SHA256  (read only)
>> TPM2.0 UEFI SPEC version:   TCG_2         (other possible setting: TCG_1_2
>> Physical Presence SPEC ver: 1.2           (other possible setting: 1.3)
>>
>> I have the feeling that at least the first 2 indicate that
>> the previous win10 installation has actually used the
>> TPM, where as on the X80 the TPM is uninitialized.
>> Note this is just a hunch I could be completely wrong.
>>
>> I would be happy to run any commands to try and debug this
>> or to build a kernel with some patches to gather more info.
>>
>> Note any kernel patches to printk some debug stuff need
>> to be based on 5.3 with 166a2809d65b reverted, without that
>> reverted the device will not boot, and thus I cannot collect
>> logs without it reverted.
>>
> 
> Are you booting a 64-bit kernel on 32-bit firmware?

Yes you are right, I must say that this is somewhat surprising
most Cherry Trail devices do use 64 bit firmware (where as Bay Trail
typically uses 32 bit). But I just checked efibootmgr output and it
says it is booting: \EFI\FEDORA\SHIMIA32.EFI so yeah 32 bit firmware.

Recent Fedora releases take care of this so seamlessly I did not
even realize...

Regards,

Hans

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 5.3 boot regression caused by 5.3 TPM changes
  2019-08-04 16:12   ` Hans de Goede
@ 2019-08-05 16:01     ` Ard Biesheuvel
  2019-08-07 19:58       ` Hans de Goede
  2019-08-07 21:55       ` Hans de Goede
  0 siblings, 2 replies; 9+ messages in thread
From: Ard Biesheuvel @ 2019-08-05 16:01 UTC (permalink / raw)
  To: Hans de Goede
  Cc: Matthew Garrett, Jarkko Sakkinen, Peter Huewe,
	Greg Kroah-Hartman, Thomas Gleixner, Ingo Molnar,
	linux-integrity, Linux Kernel Mailing List, linux-efi

On Sun, 4 Aug 2019 at 19:12, Hans de Goede <hdegoede@redhat.com> wrote:
>
> Hi,
>
> On 04-08-19 17:33, Ard Biesheuvel wrote:
> > Hi Hans,
> >
> > On Sun, 4 Aug 2019 at 13:00, Hans de Goede <hdegoede@redhat.com> wrote:
> >>
> >> Hi All,
> >>
> >> While testing 5.3-rc2 on an Irbis TW90 Intel Cherry Trail based
> >> tablet I noticed that it does not boot on this device.
> >>
> >> A git bisect points to commit 166a2809d65b ("tpm: Don't duplicate
> >> events from the final event log in the TCG2 log")
> >>
> >> And I can confirm that reverting just that single commit makes
> >> the TW90 boot again.
> >>
> >> This machine uses AptIO firmware with base component versions
> >> of: UEFI 2.4 PI 1.3. I've tried to reproduce the problem on
> >> a Teclast X80 Pro which is also CHT based and also uses AptIO
> >> firmware with the same base components. But it does not reproduce
> >> there. Neither does the problem reproduce on a CHT tablet using
> >> InsideH20 based firmware.
> >>
> >> Note that these devices have a software/firmware TPM-2.0
> >> implementation, they do not have an actual TPM chip.
> >>
> >> Comparing TPM firmware setting between the 2 AptIO based
> >> tablets the settings are identical, but the troublesome
> >> TW90 does have some more setting then the X80, it has
> >> the following settings which are not shown on the X80:
> >>
> >> Active PCR banks:           SHA-1         (read only)
> >> Available PCR banks:        SHA-1,SHA256  (read only)
> >> TPM2.0 UEFI SPEC version:   TCG_2         (other possible setting: TCG_1_2
> >> Physical Presence SPEC ver: 1.2           (other possible setting: 1.3)
> >>
> >> I have the feeling that at least the first 2 indicate that
> >> the previous win10 installation has actually used the
> >> TPM, where as on the X80 the TPM is uninitialized.
> >> Note this is just a hunch I could be completely wrong.
> >>
> >> I would be happy to run any commands to try and debug this
> >> or to build a kernel with some patches to gather more info.
> >>
> >> Note any kernel patches to printk some debug stuff need
> >> to be based on 5.3 with 166a2809d65b reverted, without that
> >> reverted the device will not boot, and thus I cannot collect
> >> logs without it reverted.
> >>
> >
> > Are you booting a 64-bit kernel on 32-bit firmware?
>
> Yes you are right, I must say that this is somewhat surprising
> most Cherry Trail devices do use 64 bit firmware (where as Bay Trail
> typically uses 32 bit). But I just checked efibootmgr output and it
> says it is booting: \EFI\FEDORA\SHIMIA32.EFI so yeah 32 bit firmware.
>
> Recent Fedora releases take care of this so seamlessly I did not
> even realize...
>

OK, so we'll have to find out how this patch affects 64-bit code
running on 32-bit firmware. The only EFI call in that patch is
get_config_table(), which is not actually a EFI boot service call but
a EFI stub helper that parses the config table array in the EFI system
table.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 5.3 boot regression caused by 5.3 TPM changes
       [not found] ` <0d5bbfe6-a95e-987e-b436-83f754d044ac@canonical.com>
@ 2019-08-06 19:27   ` Hans de Goede
  0 siblings, 0 replies; 9+ messages in thread
From: Hans de Goede @ 2019-08-06 19:27 UTC (permalink / raw)
  To: Chris Coulson, Matthew Garrett, Ard Biesheuvel, Jarkko Sakkinen,
	Peter Huewe
  Cc: Greg Kroah-Hartman, Thomas Gleixner, Ingo Molnar,
	linux-integrity, Linux Kernel Mailing List, linux-efi

Hi,

On 06-08-19 17:53, Chris Coulson wrote:
> Hi,
> 
> On 04/08/2019 11:00, Hans de Goede wrote:
>> Hi All,
>>
>> While testing 5.3-rc2 on an Irbis TW90 Intel Cherry Trail based
>> tablet I noticed that it does not boot on this device.
>>
>> A git bisect points to commit 166a2809d65b ("tpm: Don't duplicate
>> events from the final event log in the TCG2 log")
>>
>> And I can confirm that reverting just that single commit makes
>> the TW90 boot again.
>>
>> This machine uses AptIO firmware with base component versions
>> of: UEFI 2.4 PI 1.3. I've tried to reproduce the problem on
>> a Teclast X80 Pro which is also CHT based and also uses AptIO
>> firmware with the same base components. But it does not reproduce
>> there. Neither does the problem reproduce on a CHT tablet using
>> InsideH20 based firmware.
>>
>> Note that these devices have a software/firmware TPM-2.0
>> implementation, they do not have an actual TPM chip.
>>
>> Comparing TPM firmware setting between the 2 AptIO based
>> tablets the settings are identical, but the troublesome
>> TW90 does have some more setting then the X80, it has
>> the following settings which are not shown on the X80:
>>
>> Active PCR banks:           SHA-1         (read only)
>> Available PCR banks:        SHA-1,SHA256  (read only)
>> TPM2.0 UEFI SPEC version:   TCG_2         (other possible setting: TCG_1_2
>> Physical Presence SPEC ver: 1.2           (other possible setting: 1.3)
>>
>> I have the feeling that at least the first 2 indicate that
>> the previous win10 installation has actually used the
>> TPM, where as on the X80 the TPM is uninitialized.
>> Note this is just a hunch I could be completely wrong.
>>
>> I would be happy to run any commands to try and debug this
>> or to build a kernel with some patches to gather more info.
>>
>> Note any kernel patches to printk some debug stuff need
>> to be based on 5.3 with 166a2809d65b reverted, without that
>> reverted the device will not boot, and thus I cannot collect
>> logs without it reverted.
>>
>> Regards,
>>
>> Hans
> Do you think this might be the same issue as https://marc.info/?l=linux-integrity&m=155968949020639 <https://marc.info/?l=linux-integrity&m=155968949020639&w=2>?

I was hoping it would be the same issue, so I tested a 5.3 kernel
with that patch added, but unfortunately it still crashes on
the Irbis TW90.

Regards,

Hans


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 5.3 boot regression caused by 5.3 TPM changes
  2019-08-05 16:01     ` Ard Biesheuvel
@ 2019-08-07 19:58       ` Hans de Goede
  2019-08-07 20:13         ` Hans de Goede
  2019-08-07 21:55       ` Hans de Goede
  1 sibling, 1 reply; 9+ messages in thread
From: Hans de Goede @ 2019-08-07 19:58 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Matthew Garrett, Jarkko Sakkinen, Peter Huewe,
	Greg Kroah-Hartman, Thomas Gleixner, Ingo Molnar,
	linux-integrity, Linux Kernel Mailing List, linux-efi

Hi,

On 05-08-19 18:01, Ard Biesheuvel wrote:
> On Sun, 4 Aug 2019 at 19:12, Hans de Goede <hdegoede@redhat.com> wrote:
>>
>> Hi,
>>
>> On 04-08-19 17:33, Ard Biesheuvel wrote:
>>> Hi Hans,
>>>
>>> On Sun, 4 Aug 2019 at 13:00, Hans de Goede <hdegoede@redhat.com> wrote:
>>>>
>>>> Hi All,
>>>>
>>>> While testing 5.3-rc2 on an Irbis TW90 Intel Cherry Trail based
>>>> tablet I noticed that it does not boot on this device.
>>>>
>>>> A git bisect points to commit 166a2809d65b ("tpm: Don't duplicate
>>>> events from the final event log in the TCG2 log")
>>>>
>>>> And I can confirm that reverting just that single commit makes
>>>> the TW90 boot again.
>>>>
>>>> This machine uses AptIO firmware with base component versions
>>>> of: UEFI 2.4 PI 1.3. I've tried to reproduce the problem on
>>>> a Teclast X80 Pro which is also CHT based and also uses AptIO
>>>> firmware with the same base components. But it does not reproduce
>>>> there. Neither does the problem reproduce on a CHT tablet using
>>>> InsideH20 based firmware.
>>>>
>>>> Note that these devices have a software/firmware TPM-2.0
>>>> implementation, they do not have an actual TPM chip.
>>>>
>>>> Comparing TPM firmware setting between the 2 AptIO based
>>>> tablets the settings are identical, but the troublesome
>>>> TW90 does have some more setting then the X80, it has
>>>> the following settings which are not shown on the X80:
>>>>
>>>> Active PCR banks:           SHA-1         (read only)
>>>> Available PCR banks:        SHA-1,SHA256  (read only)
>>>> TPM2.0 UEFI SPEC version:   TCG_2         (other possible setting: TCG_1_2
>>>> Physical Presence SPEC ver: 1.2           (other possible setting: 1.3)
>>>>
>>>> I have the feeling that at least the first 2 indicate that
>>>> the previous win10 installation has actually used the
>>>> TPM, where as on the X80 the TPM is uninitialized.
>>>> Note this is just a hunch I could be completely wrong.
>>>>
>>>> I would be happy to run any commands to try and debug this
>>>> or to build a kernel with some patches to gather more info.
>>>>
>>>> Note any kernel patches to printk some debug stuff need
>>>> to be based on 5.3 with 166a2809d65b reverted, without that
>>>> reverted the device will not boot, and thus I cannot collect
>>>> logs without it reverted.
>>>>
>>>
>>> Are you booting a 64-bit kernel on 32-bit firmware?
>>
>> Yes you are right, I must say that this is somewhat surprising
>> most Cherry Trail devices do use 64 bit firmware (where as Bay Trail
>> typically uses 32 bit). But I just checked efibootmgr output and it
>> says it is booting: \EFI\FEDORA\SHIMIA32.EFI so yeah 32 bit firmware.
>>
>> Recent Fedora releases take care of this so seamlessly I did not
>> even realize...
>>
> 
> OK, so we'll have to find out how this patch affects 64-bit code
> running on 32-bit firmware.

I was not sure this really is a 32 bit firmware issue, as I believed
I saw 5.3 running fine on other 32 bit firmware devices, so I tried
this on another device with 32 bit UEFI and you're right this is a
32 bit issue.

> The only EFI call in that patch is
> get_config_table(), which is not actually a EFI boot service call but
> a EFI stub helper that parses the config table array in the EFI system
> table.

Well get_efi_config_table() is a new function in 5.3, introduced
specifically for the 166a2809d65b ("tpm: Don't duplicate events from the
final event log in the TCG2 log") commit.

It was introduced in commit 82d736ac56d7 ("Abstract out support for
locating an EFI config table") and after taking a good look at this
commit I'm pretty confident to say that the new get_efi_config_table()
function is the problem, as it is broken in multiple ways.

In itself the new get_efi_config_table() is just factoring out some
code used in drivers/firmware/efi/libstub/fdt.c into a new helper
for reuse and not making any functional changes to the factored out
code.

The problem is that the old code which it factors out contains a number
of assumptions which are true in the get_fdt() context from which it
was called but are not true when used in more generic code as is done
from the 166a2809d65b ("tpm: Don't duplicate events from the
final event log in the TCG2 log") commit.

There are 2 problems with the new get_efi_config_table() function:

1) sys_table->tables contains a physical address, we cannot just
cast that to a pointer and deref it, it needs to be early_memremap-ed
and then we deref the mapping. I'm somewhat amazed that this works
at all for the 64 bit case, but apparently it does.

2) sys_table->tables points to either an array of either
efi_config_table_64_t structd or an array of efi_config_table_32_t
structs.  efi_config_table_t is a generic type for storing information
when parsing it should NOT be used for reading the actual tables
as they come from the firmware when parsing! Now efi_config_table_t
happens to be an exact match for efi_config_table_64_t when building
an x86_64 kernel, so this happens to work for the 64 bit firmware case.

The properway to deal with this would be to use the existing
efi_config_parse_tables() functionality from drivers/firmware/efi/efi.c
by adding entry for the LINUX_EFI_TPM_FINAL_LOG_GUID to the
common_tables[] array in drivers/firmware/efi/efi.c and make that
entry store the table address (if found) in a new efi.final_log
member.

I'm not sure how important this functionality is to have in 5.3.

I will try to come up with a fix for this using efi_config_parse_tables()
but it might be better to just revert for 5.3 .

Regards,

Hans


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 5.3 boot regression caused by 5.3 TPM changes
  2019-08-07 19:58       ` Hans de Goede
@ 2019-08-07 20:13         ` Hans de Goede
  2019-08-07 20:40           ` Hans de Goede
  0 siblings, 1 reply; 9+ messages in thread
From: Hans de Goede @ 2019-08-07 20:13 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Matthew Garrett, Jarkko Sakkinen, Peter Huewe,
	Greg Kroah-Hartman, Thomas Gleixner, Ingo Molnar,
	linux-integrity, Linux Kernel Mailing List, linux-efi

Hi,

On 07-08-19 21:58, Hans de Goede wrote:
> Hi,
> 
> On 05-08-19 18:01, Ard Biesheuvel wrote:
>> On Sun, 4 Aug 2019 at 19:12, Hans de Goede <hdegoede@redhat.com> wrote:
>>>
>>> Hi,
>>>
>>> On 04-08-19 17:33, Ard Biesheuvel wrote:
>>>> Hi Hans,
>>>>
>>>> On Sun, 4 Aug 2019 at 13:00, Hans de Goede <hdegoede@redhat.com> wrote:
>>>>>
>>>>> Hi All,
>>>>>
>>>>> While testing 5.3-rc2 on an Irbis TW90 Intel Cherry Trail based
>>>>> tablet I noticed that it does not boot on this device.
>>>>>
>>>>> A git bisect points to commit 166a2809d65b ("tpm: Don't duplicate
>>>>> events from the final event log in the TCG2 log")
>>>>>
>>>>> And I can confirm that reverting just that single commit makes
>>>>> the TW90 boot again.
>>>>>
>>>>> This machine uses AptIO firmware with base component versions
>>>>> of: UEFI 2.4 PI 1.3. I've tried to reproduce the problem on
>>>>> a Teclast X80 Pro which is also CHT based and also uses AptIO
>>>>> firmware with the same base components. But it does not reproduce
>>>>> there. Neither does the problem reproduce on a CHT tablet using
>>>>> InsideH20 based firmware.
>>>>>
>>>>> Note that these devices have a software/firmware TPM-2.0
>>>>> implementation, they do not have an actual TPM chip.
>>>>>
>>>>> Comparing TPM firmware setting between the 2 AptIO based
>>>>> tablets the settings are identical, but the troublesome
>>>>> TW90 does have some more setting then the X80, it has
>>>>> the following settings which are not shown on the X80:
>>>>>
>>>>> Active PCR banks:           SHA-1         (read only)
>>>>> Available PCR banks:        SHA-1,SHA256  (read only)
>>>>> TPM2.0 UEFI SPEC version:   TCG_2         (other possible setting: TCG_1_2
>>>>> Physical Presence SPEC ver: 1.2           (other possible setting: 1.3)
>>>>>
>>>>> I have the feeling that at least the first 2 indicate that
>>>>> the previous win10 installation has actually used the
>>>>> TPM, where as on the X80 the TPM is uninitialized.
>>>>> Note this is just a hunch I could be completely wrong.
>>>>>
>>>>> I would be happy to run any commands to try and debug this
>>>>> or to build a kernel with some patches to gather more info.
>>>>>
>>>>> Note any kernel patches to printk some debug stuff need
>>>>> to be based on 5.3 with 166a2809d65b reverted, without that
>>>>> reverted the device will not boot, and thus I cannot collect
>>>>> logs without it reverted.
>>>>>
>>>>
>>>> Are you booting a 64-bit kernel on 32-bit firmware?
>>>
>>> Yes you are right, I must say that this is somewhat surprising
>>> most Cherry Trail devices do use 64 bit firmware (where as Bay Trail
>>> typically uses 32 bit). But I just checked efibootmgr output and it
>>> says it is booting: \EFI\FEDORA\SHIMIA32.EFI so yeah 32 bit firmware.
>>>
>>> Recent Fedora releases take care of this so seamlessly I did not
>>> even realize...
>>>
>>
>> OK, so we'll have to find out how this patch affects 64-bit code
>> running on 32-bit firmware.
> 
> I was not sure this really is a 32 bit firmware issue, as I believed
> I saw 5.3 running fine on other 32 bit firmware devices, so I tried
> this on another device with 32 bit UEFI and you're right this is a
> 32 bit issue.
> 
>> The only EFI call in that patch is
>> get_config_table(), which is not actually a EFI boot service call but
>> a EFI stub helper that parses the config table array in the EFI system
>> table.
> 
> Well get_efi_config_table() is a new function in 5.3, introduced
> specifically for the 166a2809d65b ("tpm: Don't duplicate events from the
> final event log in the TCG2 log") commit.
> 
> It was introduced in commit 82d736ac56d7 ("Abstract out support for
> locating an EFI config table") and after taking a good look at this
> commit I'm pretty confident to say that the new get_efi_config_table()
> function is the problem, as it is broken in multiple ways.
> 
> In itself the new get_efi_config_table() is just factoring out some
> code used in drivers/firmware/efi/libstub/fdt.c into a new helper
> for reuse and not making any functional changes to the factored out
> code.
> 
> The problem is that the old code which it factors out contains a number
> of assumptions which are true in the get_fdt() context from which it
> was called but are not true when used in more generic code as is done
> from the 166a2809d65b ("tpm: Don't duplicate events from the
> final event log in the TCG2 log") commit.
> 
> There are 2 problems with the new get_efi_config_table() function:
> 
> 1) sys_table->tables contains a physical address, we cannot just
> cast that to a pointer and deref it, it needs to be early_memremap-ed
> and then we deref the mapping. I'm somewhat amazed that this works
> at all for the 64 bit case, but apparently it does.
> 
> 2) sys_table->tables points to either an array of either
> efi_config_table_64_t structd or an array of efi_config_table_32_t
> structs.  efi_config_table_t is a generic type for storing information
> when parsing it should NOT be used for reading the actual tables
> as they come from the firmware when parsing! Now efi_config_table_t
> happens to be an exact match for efi_config_table_64_t when building
> an x86_64 kernel, so this happens to work for the 64 bit firmware case.
> 
> The properway to deal with this would be to use the existing
> efi_config_parse_tables() functionality from drivers/firmware/efi/efi.c
> by adding entry for the LINUX_EFI_TPM_FINAL_LOG_GUID to the
> common_tables[] array in drivers/firmware/efi/efi.c and make that
> entry store the table address (if found) in a new efi.final_log
> member.

There actually already is a efi.tpm_final_log member where the
table's physical address is waiting for us all pre-parsed and ready
to use ...

> I'm not sure how important this functionality is to have in 5.3.
> 
> I will try to come up with a fix for this using efi_config_parse_tables()
> but it might be better to just revert for 5.3 .

Regards,

Hans

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 5.3 boot regression caused by 5.3 TPM changes
  2019-08-07 20:13         ` Hans de Goede
@ 2019-08-07 20:40           ` Hans de Goede
  0 siblings, 0 replies; 9+ messages in thread
From: Hans de Goede @ 2019-08-07 20:40 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Matthew Garrett, Jarkko Sakkinen, Peter Huewe,
	Greg Kroah-Hartman, Thomas Gleixner, Ingo Molnar,
	linux-integrity, Linux Kernel Mailing List, linux-efi

Hi,

On 07-08-19 22:13, Hans de Goede wrote:
> Hi,
> 
> On 07-08-19 21:58, Hans de Goede wrote:
>> Hi,
>>
>> On 05-08-19 18:01, Ard Biesheuvel wrote:
>>> On Sun, 4 Aug 2019 at 19:12, Hans de Goede <hdegoede@redhat.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> On 04-08-19 17:33, Ard Biesheuvel wrote:
>>>>> Hi Hans,
>>>>>
>>>>> On Sun, 4 Aug 2019 at 13:00, Hans de Goede <hdegoede@redhat.com> wrote:
>>>>>>
>>>>>> Hi All,
>>>>>>
>>>>>> While testing 5.3-rc2 on an Irbis TW90 Intel Cherry Trail based
>>>>>> tablet I noticed that it does not boot on this device.
>>>>>>
>>>>>> A git bisect points to commit 166a2809d65b ("tpm: Don't duplicate
>>>>>> events from the final event log in the TCG2 log")
>>>>>>
>>>>>> And I can confirm that reverting just that single commit makes
>>>>>> the TW90 boot again.
>>>>>>
>>>>>> This machine uses AptIO firmware with base component versions
>>>>>> of: UEFI 2.4 PI 1.3. I've tried to reproduce the problem on
>>>>>> a Teclast X80 Pro which is also CHT based and also uses AptIO
>>>>>> firmware with the same base components. But it does not reproduce
>>>>>> there. Neither does the problem reproduce on a CHT tablet using
>>>>>> InsideH20 based firmware.
>>>>>>
>>>>>> Note that these devices have a software/firmware TPM-2.0
>>>>>> implementation, they do not have an actual TPM chip.
>>>>>>
>>>>>> Comparing TPM firmware setting between the 2 AptIO based
>>>>>> tablets the settings are identical, but the troublesome
>>>>>> TW90 does have some more setting then the X80, it has
>>>>>> the following settings which are not shown on the X80:
>>>>>>
>>>>>> Active PCR banks:           SHA-1         (read only)
>>>>>> Available PCR banks:        SHA-1,SHA256  (read only)
>>>>>> TPM2.0 UEFI SPEC version:   TCG_2         (other possible setting: TCG_1_2
>>>>>> Physical Presence SPEC ver: 1.2           (other possible setting: 1.3)
>>>>>>
>>>>>> I have the feeling that at least the first 2 indicate that
>>>>>> the previous win10 installation has actually used the
>>>>>> TPM, where as on the X80 the TPM is uninitialized.
>>>>>> Note this is just a hunch I could be completely wrong.
>>>>>>
>>>>>> I would be happy to run any commands to try and debug this
>>>>>> or to build a kernel with some patches to gather more info.
>>>>>>
>>>>>> Note any kernel patches to printk some debug stuff need
>>>>>> to be based on 5.3 with 166a2809d65b reverted, without that
>>>>>> reverted the device will not boot, and thus I cannot collect
>>>>>> logs without it reverted.
>>>>>>
>>>>>
>>>>> Are you booting a 64-bit kernel on 32-bit firmware?
>>>>
>>>> Yes you are right, I must say that this is somewhat surprising
>>>> most Cherry Trail devices do use 64 bit firmware (where as Bay Trail
>>>> typically uses 32 bit). But I just checked efibootmgr output and it
>>>> says it is booting: \EFI\FEDORA\SHIMIA32.EFI so yeah 32 bit firmware.
>>>>
>>>> Recent Fedora releases take care of this so seamlessly I did not
>>>> even realize...
>>>>
>>>
>>> OK, so we'll have to find out how this patch affects 64-bit code
>>> running on 32-bit firmware.
>>
>> I was not sure this really is a 32 bit firmware issue, as I believed
>> I saw 5.3 running fine on other 32 bit firmware devices, so I tried
>> this on another device with 32 bit UEFI and you're right this is a
>> 32 bit issue.
>>
>>> The only EFI call in that patch is
>>> get_config_table(), which is not actually a EFI boot service call but
>>> a EFI stub helper that parses the config table array in the EFI system
>>> table.
>>
>> Well get_efi_config_table() is a new function in 5.3, introduced
>> specifically for the 166a2809d65b ("tpm: Don't duplicate events from the
>> final event log in the TCG2 log") commit.
>>
>> It was introduced in commit 82d736ac56d7 ("Abstract out support for
>> locating an EFI config table") and after taking a good look at this
>> commit I'm pretty confident to say that the new get_efi_config_table()
>> function is the problem, as it is broken in multiple ways.
>>
>> In itself the new get_efi_config_table() is just factoring out some
>> code used in drivers/firmware/efi/libstub/fdt.c into a new helper
>> for reuse and not making any functional changes to the factored out
>> code.
>>
>> The problem is that the old code which it factors out contains a number
>> of assumptions which are true in the get_fdt() context from which it
>> was called but are not true when used in more generic code as is done
>> from the 166a2809d65b ("tpm: Don't duplicate events from the
>> final event log in the TCG2 log") commit.
>>
>> There are 2 problems with the new get_efi_config_table() function:
>>
>> 1) sys_table->tables contains a physical address, we cannot just
>> cast that to a pointer and deref it, it needs to be early_memremap-ed
>> and then we deref the mapping. I'm somewhat amazed that this works
>> at all for the 64 bit case, but apparently it does.
>>
>> 2) sys_table->tables points to either an array of either
>> efi_config_table_64_t structd or an array of efi_config_table_32_t
>> structs.  efi_config_table_t is a generic type for storing information
>> when parsing it should NOT be used for reading the actual tables
>> as they come from the firmware when parsing! Now efi_config_table_t
>> happens to be an exact match for efi_config_table_64_t when building
>> an x86_64 kernel, so this happens to work for the 64 bit firmware case.
>>
>> The properway to deal with this would be to use the existing
>> efi_config_parse_tables() functionality from drivers/firmware/efi/efi.c
>> by adding entry for the LINUX_EFI_TPM_FINAL_LOG_GUID to the
>> common_tables[] array in drivers/firmware/efi/efi.c and make that
>> entry store the table address (if found) in a new efi.final_log
>> member.
> 
> There actually already is a efi.tpm_final_log member where the
> table's physical address is waiting for us all pre-parsed and ready
> to use ...

Oops, I missed the code in question is running really early during
boot, so please forget most of my rambling above, as it is wrong
given how early during boot this is happening.

Regards,

Hans


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 5.3 boot regression caused by 5.3 TPM changes
  2019-08-05 16:01     ` Ard Biesheuvel
  2019-08-07 19:58       ` Hans de Goede
@ 2019-08-07 21:55       ` Hans de Goede
  1 sibling, 0 replies; 9+ messages in thread
From: Hans de Goede @ 2019-08-07 21:55 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Matthew Garrett, Jarkko Sakkinen, Peter Huewe,
	Greg Kroah-Hartman, Thomas Gleixner, Ingo Molnar,
	linux-integrity, Linux Kernel Mailing List, linux-efi

Hi,

On 05-08-19 18:01, Ard Biesheuvel wrote:
> On Sun, 4 Aug 2019 at 19:12, Hans de Goede <hdegoede@redhat.com> wrote:
>>
>> Hi,
>>
>> On 04-08-19 17:33, Ard Biesheuvel wrote:
>>> Hi Hans,
>>>
>>> On Sun, 4 Aug 2019 at 13:00, Hans de Goede <hdegoede@redhat.com> wrote:
>>>>
>>>> Hi All,
>>>>
>>>> While testing 5.3-rc2 on an Irbis TW90 Intel Cherry Trail based
>>>> tablet I noticed that it does not boot on this device.
>>>>
>>>> A git bisect points to commit 166a2809d65b ("tpm: Don't duplicate
>>>> events from the final event log in the TCG2 log")
>>>>
>>>> And I can confirm that reverting just that single commit makes
>>>> the TW90 boot again.
>>>>
>>>> This machine uses AptIO firmware with base component versions
>>>> of: UEFI 2.4 PI 1.3. I've tried to reproduce the problem on
>>>> a Teclast X80 Pro which is also CHT based and also uses AptIO
>>>> firmware with the same base components. But it does not reproduce
>>>> there. Neither does the problem reproduce on a CHT tablet using
>>>> InsideH20 based firmware.
>>>>
>>>> Note that these devices have a software/firmware TPM-2.0
>>>> implementation, they do not have an actual TPM chip.
>>>>
>>>> Comparing TPM firmware setting between the 2 AptIO based
>>>> tablets the settings are identical, but the troublesome
>>>> TW90 does have some more setting then the X80, it has
>>>> the following settings which are not shown on the X80:
>>>>
>>>> Active PCR banks:           SHA-1         (read only)
>>>> Available PCR banks:        SHA-1,SHA256  (read only)
>>>> TPM2.0 UEFI SPEC version:   TCG_2         (other possible setting: TCG_1_2
>>>> Physical Presence SPEC ver: 1.2           (other possible setting: 1.3)
>>>>
>>>> I have the feeling that at least the first 2 indicate that
>>>> the previous win10 installation has actually used the
>>>> TPM, where as on the X80 the TPM is uninitialized.
>>>> Note this is just a hunch I could be completely wrong.
>>>>
>>>> I would be happy to run any commands to try and debug this
>>>> or to build a kernel with some patches to gather more info.
>>>>
>>>> Note any kernel patches to printk some debug stuff need
>>>> to be based on 5.3 with 166a2809d65b reverted, without that
>>>> reverted the device will not boot, and thus I cannot collect
>>>> logs without it reverted.
>>>>
>>>
>>> Are you booting a 64-bit kernel on 32-bit firmware?
>>
>> Yes you are right, I must say that this is somewhat surprising
>> most Cherry Trail devices do use 64 bit firmware (where as Bay Trail
>> typically uses 32 bit). But I just checked efibootmgr output and it
>> says it is booting: \EFI\FEDORA\SHIMIA32.EFI so yeah 32 bit firmware.
>>
>> Recent Fedora releases take care of this so seamlessly I did not
>> even realize...
>>
> 
> OK, so we'll have to find out how this patch affects 64-bit code
> running on 32-bit firmware. The only EFI call in that patch is
> get_config_table(), which is not actually a EFI boot service call but
> a EFI stub helper that parses the config table array in the EFI system
> table.

Ok, the problem indeed is the new get_efi_config_table() helper, it
does not make any calls, but it does interpret some structs which
have different sized members depending on if the firmware is 32 or 64 bit.

I've prepared a patch fixing this which I will send out after this mail.

Regards,

Hans


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2019-08-07 21:55 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-04 10:00 5.3 boot regression caused by 5.3 TPM changes Hans de Goede
2019-08-04 15:33 ` Ard Biesheuvel
2019-08-04 16:12   ` Hans de Goede
2019-08-05 16:01     ` Ard Biesheuvel
2019-08-07 19:58       ` Hans de Goede
2019-08-07 20:13         ` Hans de Goede
2019-08-07 20:40           ` Hans de Goede
2019-08-07 21:55       ` Hans de Goede
     [not found] ` <0d5bbfe6-a95e-987e-b436-83f754d044ac@canonical.com>
2019-08-06 19:27   ` Hans de Goede

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).