linux-integrity.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Hao Wu <hao.wu@rubrik.com>
To: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>,
	James Bottomley <James.Bottomley@hansenpartnership.com>,
	peterhuewe@gmx.de, jgg@ziepe.ca
Cc: Nayna <nayna@linux.vnet.ibm.com>,
	arnd@arndb.de, gregkh@linuxfoundation.org,
	Hamza Attak <hamza@hpe.com>,
	why2jjj.linux@gmail.com, zohar@linux.vnet.ibm.com,
	linux-integrity@vger.kernel.org,
	Paul Menzel <pmenzel@molgen.mpg.de>,
	Ken Goldman <kgold@linux.ibm.com>,
	Seungyeop Han <seungyeop.han@rubrik.com>,
	Shrihari Kalkar <shrihari.kalkar@rubrik.com>,
	Anish Jhaveri <anish.jhaveri@rubrik.com>
Subject: Re: [PATCH] Fix Atmel TPM crash caused by too frequent queries
Date: Sat, 8 May 2021 23:31:06 -0700	[thread overview]
Message-ID: <12157F1B-93CC-4991-813D-B78A608D1C4F@rubrik.com> (raw)
In-Reply-To: <B55FCFAC-BFBD-4568-AB2F-45DD7BE10D6D@rubrik.com>

> On May 8, 2021, at 11:18 PM, Hao Wu <hao.wu@rubrik.com> wrote:
> 
>> On Nov 18, 2020, at 1:11 PM, Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> wrote:
>> 
>> On Fri, Nov 13, 2020 at 08:39:28PM -0800, Hao Wu wrote:
>>>> On Oct 17, 2020, at 10:20 PM, Hao Wu <hao.wu@rubrik.com> wrote:
>>>> 
>>>>> On Oct 17, 2020, at 10:09 PM, Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> wrote:
>>>>> 
>>>>> On Fri, Oct 16, 2020 at 11:11:37PM -0700, Hao Wu wrote:
>>>>>>> On Oct 1, 2020, at 4:04 PM, Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> wrote:
>>>>>>> 
>>>>>>> On Thu, Oct 01, 2020 at 11:32:59AM -0700, James Bottomley wrote:
>>>>>>>> On Thu, 2020-10-01 at 14:15 -0400, Nayna wrote:
>>>>>>>>> On 10/1/20 12:53 AM, James Bottomley wrote:
>>>>>>>>>> On Thu, 2020-10-01 at 04:50 +0300, Jarkko Sakkinen wrote:
>>>>>>>>>>> On Wed, Sep 30, 2020 at 03:31:20PM -0700, James Bottomley wrote:
>>>>>>>>>>>> On Thu, 2020-10-01 at 00:09 +0300, Jarkko Sakkinen wrote:
>>>>>>>> [...]
>>>>>>>>>>>>> I also wonder if we could adjust the frequency dynamically.
>>>>>>>>>>>>> I.e. start with optimistic value and lower it until finding
>>>>>>>>>>>>> the sweet spot.
>>>>>>>>>>>> 
>>>>>>>>>>>> The problem is the way this crashes: the TPM seems to be
>>>>>>>>>>>> unrecoverable. If it were recoverable without a hard reset of
>>>>>>>>>>>> the entire machine, we could certainly play around with it.  I
>>>>>>>>>>>> can try alternative mechanisms to see if anything's viable, but
>>>>>>>>>>>> to all intents and purposes, it looks like my TPM simply stops
>>>>>>>>>>>> responding to the TIS interface.
>>>>>>>>>>> 
>>>>>>>>>>> A quickly scraped idea probably with some holes in it but I was
>>>>>>>>>>> thinking something like
>>>>>>>>>>> 
>>>>>>>>>>> 1. Initially set slow value for latency, this could be the
>>>>>>>>>>> original 15 ms.
>>>>>>>>>>> 2. Use this to read TPM_PT_VENDOR_STRING_*.
>>>>>>>>>>> 3. Lookup based vendor string from a fixup table a latency that
>>>>>>>>>>> works
>>>>>>>>>>> (the fallback latency could be the existing latency).
>>>>>>>>>> 
>>>>>>>>>> Well, yes, that was sort of what I was thinking of doing for the
>>>>>>>>>> Atmel ... except I was thinking of using the TIS VID (16 byte
>>>>>>>>>> assigned vendor ID) which means we can get the information to set
>>>>>>>>>> the timeout before we have to do any TPM operations.
>>>>>>>>> 
>>>>>>>>> I wonder if the timeout issue exists for all TPM commands for the
>>>>>>>>> same manufacturer.  For example, does the ATMEL TPM also crash when 
>>>>>>>>> extending  PCRs ?
>>>>>>>>> 
>>>>>>>>> In addition to defining a per TPM vendor based lookup table for
>>>>>>>>> timeout, would it be a good idea to also define a Kconfig/boot param
>>>>>>>>> option to allow timeout setting.  This will enable to set the timeout
>>>>>>>>> based on the specific use.
>>>>>>>> 
>>>>>>>> I don't think we need go that far (yet).  The timing change has been in
>>>>>>>> upstream since:
>>>>>>>> 
>>>>>>>> commit 424eaf910c329ab06ad03a527ef45dcf6a328f00
>>>>>>>> Author: Nayna Jain <nayna@linux.vnet.ibm.com>
>>>>>>>> Date:   Wed May 16 01:51:25 2018 -0400
>>>>>>>> 
>>>>>>>> tpm: reduce polling time to usecs for even finer granularity
>>>>>>>> 
>>>>>>>> Which was in the released kernel 4.18: over two years ago.  In all that
>>>>>>>> time we've discovered two problems: mine which looks to be an artifact
>>>>>>>> of an experimental upgrade process in a new nuvoton and the Atmel. 
>>>>>>>> That means pretty much every other TPM simply works with the existing
>>>>>>>> timings
>>>>>>>> 
>>>>>>>>> I was also thinking how will we decide the lookup table values for
>>>>>>>>> each vendor ?
>>>>>>>> 
>>>>>>>> I wasn't thinking we would.  I was thinking I'd do a simple exception
>>>>>>>> for the Atmel and nothing else.  I don't think my Nuvoton is in any way
>>>>>>>> characteristic.  Indeed my pluggable TPM rainbow bridge system works
>>>>>>>> just fine with a Nuvoton and the current timings.
>>>>>>>> 
>>>>>>>> We can add additional exceptions if they actually turn up.
>>>>>>> 
>>>>>>> I'd add a table and fallback.
>>>>>>> 
>>>>>> 
>>>>>> Hi folks,
>>>>>> 
>>>>>> I want to follow up this a bit and check whether we reached a consensus 
>>>>>> on how to fix the timeout issue for Atmel chip.
>>>>>> 
>>>>>> Should we revert the changes or introduce the lookup table for chips.
>>>>>> 
>>>>>> Is there anything I can help from Rubrik side.
>>>>>> 
>>>>>> Thanks
>>>>>> Hao
>>>>> 
>>>>> There is nothing to revert as the previous was not applied but I'm
>>>>> of course ready to review any new attempts.
>>>>> 
>>>> 
>>>> Hi Jarkko,
>>>> 
>>>> By “revert” I meant we revert the timeout value changes by applying
>>>> the patch I proposed, as the timeout value discussed does cause issues.
>>>> 
>>>> Why don’t we apply the patch and improve the perf in the way of not
>>>> breaking TPMs ? 
>>>> 
>>>> Hao
>>> 
>>> Hi Jarkko and folks,
>>> 
>>> It’s being a while since our last discussion. I want to push a fix in the upstream for ateml chip. 
>>> It looks like we currently have following choices:
>>> 1.  generic fix for all vendors: have a lookup table for sleep time of wait_for_tpm_stat 
>>> (i.e. TPM_TIMEOUT_WAIT_STAT in my proposed patch) 
>>> 2.  quick fix for the regression: change the sleep time of wait_for_tpm_stat back to 15ms.
>>> It is the current proposed patch
>>> 3. Fix regression by making exception for ateml chip.  
>>> 
>>> Should we reach consensus on which one we want to pursue before dig
>>> into implementation of the patch? In my opinion, I prefer to fix the
>>> regression with 2, and then pursue 1 as long-term solution. 3 is
>>> hacky.
>> 
>> What does option 1 fix for *all* vendors?
>> 
>>> Let me know what do you guys think
>>> 
>>> Hao
>> 
>> /Jarkko
> 
> Hi Jarkko and folks,
> 
> It has been a while again. In my previous message I answered Jarkko’s question about the option 1.
> Jarkko, let me know if it is clear to you or you have further questions and suggestions on next to do.
> Somehow I couldn’t found the last message I sent but it is in 
> https://patchwork.kernel.org/project/linux-integrity/patch/20200926223150.109645-1-hao.wu@rubrik.com/
> 
> In high-level, the option 1 is to add a timing lookup table for each manufacture, hence we can
> configure timing for each chip respectively. Then we don’t need to worry about fixing ATMEL
> timing may cause performance degradation for other chips.
> 
> I do want to push the fix in TPM driver, which is likely to be hit going forward again when people are doing
> refactoring without testing chips from all manufacturing.
> 
> Let me know how should I push this forward.
> 
> Thanks
> Hao
> 
It looks like Jarkko’s email address (jarkko.sakkinen@linux.intel.com) is unreachable now,
can other TPM maintainer / reviewer help make a call and unblock this ? 

Thanks
Hao

  reply	other threads:[~2021-05-09  6:31 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-26 22:31 [PATCH] Fix Atmel TPM crash caused by too frequent queries Hao Wu
2020-09-26 22:57 ` James Bottomley
2020-09-26 23:10   ` Hao Wu
2020-09-27 18:25     ` James Bottomley
2020-09-28  0:11       ` Hao Wu
2020-09-28  0:15         ` Hao Wu
2020-09-28  1:22         ` James Bottomley
2020-09-28  5:59           ` Hao Wu
2020-09-28 22:11             ` James Bottomley
2020-09-29  4:46               ` Hao Wu
2020-09-30  2:16               ` Jarkko Sakkinen
2020-09-30 14:54                 ` James Bottomley
2020-09-30 15:37                   ` Jarkko Sakkinen
2020-09-30 20:48                     ` James Bottomley
2020-09-30 21:09                       ` Jarkko Sakkinen
2020-09-30 22:31                         ` James Bottomley
2020-10-01  1:50                           ` Jarkko Sakkinen
2020-10-01  4:53                             ` James Bottomley
2020-10-01 18:15                               ` Nayna
2020-10-01 18:32                                 ` James Bottomley
2020-10-01 23:04                                   ` Jarkko Sakkinen
2020-10-17  6:11                                     ` Hao Wu
2020-10-18  5:09                                       ` Jarkko Sakkinen
2020-10-18  5:20                                         ` Hao Wu
2020-11-14  4:39                                           ` Hao Wu
2020-11-18 21:11                                             ` Jarkko Sakkinen
2020-11-18 23:23                                               ` Hao Wu
2021-05-09  6:18                                               ` Hao Wu
2021-05-09  6:31                                                 ` Hao Wu [this message]
2021-05-10  2:17                                                   ` Mimi Zohar
2021-05-10  3:15                                                     ` Hao Wu
2021-05-10 17:28                                                     ` Jarkko Sakkinen
2020-09-28  1:08       ` Jarkko Sakkinen
2020-09-28  6:03         ` Hao Wu
2020-09-28 14:16           ` Jarkko Sakkinen
2020-09-28 17:49             ` Hao Wu
2020-09-28 19:47               ` Jarkko Sakkinen
2020-09-28 20:27                 ` Hao Wu
2020-09-30  2:11                   ` Jarkko Sakkinen
2020-09-30  3:41                     ` Hao Wu
     [not found]                       ` <EA1EE8F8-F054-4E1B-B830-231398D33CB8@rubrik.com>
2020-10-01 14:16                         ` Mimi Zohar
  -- strict thread matches above, loose matches on Subject: below --
2021-06-20 23:18 Hao Wu
2021-06-23 13:35 ` Jarkko Sakkinen
2021-06-24  5:49   ` Hao Wu
2021-06-29 20:06     ` Jarkko Sakkinen
2021-06-30  4:27       ` Hao Wu
2021-06-24  5:33 ` Hao Wu
2021-06-29 20:07   ` Jarkko Sakkinen
2020-09-14  6:13 Hao Wu
2020-09-14  6:17 ` Greg KH
2020-09-15  2:52 ` Hao Wu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=12157F1B-93CC-4991-813D-B78A608D1C4F@rubrik.com \
    --to=hao.wu@rubrik.com \
    --cc=James.Bottomley@hansenpartnership.com \
    --cc=anish.jhaveri@rubrik.com \
    --cc=arnd@arndb.de \
    --cc=gregkh@linuxfoundation.org \
    --cc=hamza@hpe.com \
    --cc=jarkko.sakkinen@linux.intel.com \
    --cc=jgg@ziepe.ca \
    --cc=kgold@linux.ibm.com \
    --cc=linux-integrity@vger.kernel.org \
    --cc=nayna@linux.vnet.ibm.com \
    --cc=peterhuewe@gmx.de \
    --cc=pmenzel@molgen.mpg.de \
    --cc=seungyeop.han@rubrik.com \
    --cc=shrihari.kalkar@rubrik.com \
    --cc=why2jjj.linux@gmail.com \
    --cc=zohar@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).