From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AE578C5519F for ; Sat, 14 Nov 2020 04:39:55 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 53D0322275 for ; Sat, 14 Nov 2020 04:39:55 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=rubrik.com header.i=@rubrik.com header.b="B8qSDtgY" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726228AbgKNEje (ORCPT ); Fri, 13 Nov 2020 23:39:34 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50676 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726166AbgKNEje (ORCPT ); Fri, 13 Nov 2020 23:39:34 -0500 Received: from mail-pf1-x444.google.com (mail-pf1-x444.google.com [IPv6:2607:f8b0:4864:20::444]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 87A62C0613D1 for ; Fri, 13 Nov 2020 20:39:32 -0800 (PST) Received: by mail-pf1-x444.google.com with SMTP id w6so9333203pfu.1 for ; Fri, 13 Nov 2020 20:39:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rubrik.com; s=google; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=h2SmTo2VVuPol42bbBljNT/XftAlZHcVhwYRAkkRSK8=; b=B8qSDtgYlompAYQbM0RzN4ucspR3V94OQH8jDaTInnwjcJ2rQzCn5ReFHM7W0EaDRr EOR8ILcfh8rSJAtqPqmt2FNNEPGQ1UqRYKbKI3hv+9omqC+8ou25dLYqhAKgZf4hsLR4 Bb9SPCHbkDNMCJBuZylLbk3oreoqcWmewEKvY= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=h2SmTo2VVuPol42bbBljNT/XftAlZHcVhwYRAkkRSK8=; b=VLYmCQBX5pdU4Us3eW2lJDxqgdbExv+bG4PuFDwWIfawtEgHSN2yCrKCRup4FNcxv6 Gc+XrBvQ356lVgpi6g42SSd+gHXO5g3q+mMM98/OGJYggHrCP76sc2U9LnrA5HLi2ghv VasowJ5AAe8vt6H24WwPt5n1OidUoPFfvouG31DgPL6pRVyW0PKR/Z53UPdnOfdI+p4j 4ixqGaEyn2x+Fi47TAY1k907R5WXiFE7dSV3Ya7LZkeqObke1iCJIqU1OyaQ3FmTr9Ke NGNwH9DkwvgneH8sDpkTQW5/yleTa1yCxUT3qto/611QUs+thechUl5oyiCTU76ETeeV expw== X-Gm-Message-State: AOAM533qhLO+m2Bu+xfp6A01ACwGecKB2cKDOGuowZVHXJ8ELVn9dCdb jVPaiJsNQ/9ms0ABV28ziz+jwA== X-Google-Smtp-Source: ABdhPJyxjTV+jkVk86Fe8aMp4QbC9zycY8vsKuvFLejvT7wsa3+0ABOvg1RP+gdC0RWSwsa7/L4SNA== X-Received: by 2002:a63:2cd4:: with SMTP id s203mr4507749pgs.54.1605328771651; Fri, 13 Nov 2020 20:39:31 -0800 (PST) Received: from ?IPv6:2601:647:4200:3be0:1843:53b1:ef78:c159? ([2601:647:4200:3be0:1843:53b1:ef78:c159]) by smtp.gmail.com with ESMTPSA id h9sm12864525pjs.26.2020.11.13.20.39.29 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Fri, 13 Nov 2020 20:39:31 -0800 (PST) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.4\)) Subject: Re: [PATCH] Fix Atmel TPM crash caused by too frequent queries From: Hao Wu In-Reply-To: <53B75B06-FD89-4B00-BC3F-46C5B28DC201@rubrik.com> Date: Fri, 13 Nov 2020 20:39:28 -0800 Cc: James Bottomley , Nayna , peterhuewe@gmx.de, jgg@ziepe.ca, arnd@arndb.de, gregkh@linuxfoundation.org, Hamza Attak , why2jjj.linux@gmail.com, zohar@linux.vnet.ibm.com, linux-integrity@vger.kernel.org, Paul Menzel , Ken Goldman , Seungyeop Han , Shrihari Kalkar , Anish Jhaveri Content-Transfer-Encoding: quoted-printable Message-Id: <9E249567-4901-4FA4-BA89-EF6DE51F7E7A@rubrik.com> References: <20200930153715.GC52739@linux.intel.com> <95aafaa1e3037cb7b99ae0e76c02a419d366a407.camel@HansenPartnership.com> <20200930210956.GC65339@linux.intel.com> <6e7b54c268d25a86f8f969bcc01729eaadef6530.camel@HansenPartnership.com> <20201001015051.GA5971@linux.intel.com> <1aed1b0734435959d5e53b8a4b3c18558243e6b8.camel@HansenPartnership.com> <19de5527-2d56-6a07-3ce7-ba216b208090@linux.vnet.ibm.com> <38e165055bae62d4e97f702c05e3a76ccdeeac0f.camel@HansenPartnership.com> <20201001230426.GA26517@linux.intel.com> <20201018050951.GL68722@linux.intel.com> <53B75B06-FD89-4B00-BC3F-46C5B28DC201@rubrik.com> To: Jarkko Sakkinen X-Mailer: Apple Mail (2.3608.120.23.2.4) Precedence: bulk List-ID: X-Mailing-List: linux-integrity@vger.kernel.org > On Oct 17, 2020, at 10:20 PM, Hao Wu wrote: >=20 >> On Oct 17, 2020, at 10:09 PM, Jarkko Sakkinen = wrote: >>=20 >> On Fri, Oct 16, 2020 at 11:11:37PM -0700, Hao Wu wrote: >>>> On Oct 1, 2020, at 4:04 PM, Jarkko Sakkinen = wrote: >>>>=20 >>>> On Thu, Oct 01, 2020 at 11:32:59AM -0700, James Bottomley wrote: >>>>> On Thu, 2020-10-01 at 14:15 -0400, Nayna wrote: >>>>>> On 10/1/20 12:53 AM, James Bottomley wrote: >>>>>>> On Thu, 2020-10-01 at 04:50 +0300, Jarkko Sakkinen wrote: >>>>>>>> On Wed, Sep 30, 2020 at 03:31:20PM -0700, James Bottomley = wrote: >>>>>>>>> On Thu, 2020-10-01 at 00:09 +0300, Jarkko Sakkinen wrote: >>>>> [...] >>>>>>>>>> I also wonder if we could adjust the frequency dynamically. >>>>>>>>>> I.e. start with optimistic value and lower it until finding >>>>>>>>>> the sweet spot. >>>>>>>>>=20 >>>>>>>>> The problem is the way this crashes: the TPM seems to be >>>>>>>>> unrecoverable. If it were recoverable without a hard reset of >>>>>>>>> the entire machine, we could certainly play around with it. I >>>>>>>>> can try alternative mechanisms to see if anything's viable, = but >>>>>>>>> to all intents and purposes, it looks like my TPM simply stops >>>>>>>>> responding to the TIS interface. >>>>>>>>=20 >>>>>>>> A quickly scraped idea probably with some holes in it but I was >>>>>>>> thinking something like >>>>>>>>=20 >>>>>>>> 1. Initially set slow value for latency, this could be the >>>>>>>> original 15 ms. >>>>>>>> 2. Use this to read TPM_PT_VENDOR_STRING_*. >>>>>>>> 3. Lookup based vendor string from a fixup table a latency that >>>>>>>> works >>>>>>>> (the fallback latency could be the existing latency). >>>>>>>=20 >>>>>>> Well, yes, that was sort of what I was thinking of doing for the >>>>>>> Atmel ... except I was thinking of using the TIS VID (16 byte >>>>>>> assigned vendor ID) which means we can get the information to = set >>>>>>> the timeout before we have to do any TPM operations. >>>>>>=20 >>>>>> I wonder if the timeout issue exists for all TPM commands for the >>>>>> same manufacturer. For example, does the ATMEL TPM also crash = when=20 >>>>>> extending PCRs ? >>>>>>=20 >>>>>> In addition to defining a per TPM vendor based lookup table for >>>>>> timeout, would it be a good idea to also define a Kconfig/boot = param >>>>>> option to allow timeout setting. This will enable to set the = timeout >>>>>> based on the specific use. >>>>>=20 >>>>> I don't think we need go that far (yet). The timing change has = been in >>>>> upstream since: >>>>>=20 >>>>> commit 424eaf910c329ab06ad03a527ef45dcf6a328f00 >>>>> Author: Nayna Jain >>>>> Date: Wed May 16 01:51:25 2018 -0400 >>>>>=20 >>>>> tpm: reduce polling time to usecs for even finer granularity >>>>>=20 >>>>> Which was in the released kernel 4.18: over two years ago. In all = that >>>>> time we've discovered two problems: mine which looks to be an = artifact >>>>> of an experimental upgrade process in a new nuvoton and the Atmel.=20= >>>>> That means pretty much every other TPM simply works with the = existing >>>>> timings >>>>>=20 >>>>>> I was also thinking how will we decide the lookup table values = for >>>>>> each vendor ? >>>>>=20 >>>>> I wasn't thinking we would. I was thinking I'd do a simple = exception >>>>> for the Atmel and nothing else. I don't think my Nuvoton is in = any way >>>>> characteristic. Indeed my pluggable TPM rainbow bridge system = works >>>>> just fine with a Nuvoton and the current timings. >>>>>=20 >>>>> We can add additional exceptions if they actually turn up. >>>>=20 >>>> I'd add a table and fallback. >>>>=20 >>>=20 >>> Hi folks, >>>=20 >>> I want to follow up this a bit and check whether we reached a = consensus=20 >>> on how to fix the timeout issue for Atmel chip. >>>=20 >>> Should we revert the changes or introduce the lookup table for = chips. >>>=20 >>> Is there anything I can help from Rubrik side. >>>=20 >>> Thanks >>> Hao >>=20 >> There is nothing to revert as the previous was not applied but I'm >> of course ready to review any new attempts. >>=20 >=20 > Hi Jarkko, >=20 > By =E2=80=9Crevert=E2=80=9D I meant we revert the timeout value = changes by applying > the patch I proposed, as the timeout value discussed does cause = issues. >=20 > Why don=E2=80=99t we apply the patch and improve the perf in the way = of not > breaking TPMs ?=20 >=20 > Hao Hi Jarkko and folks, It=E2=80=99s being a while since our last discussion. I want to push a = fix in the upstream for ateml chip.=20 It looks like we currently have following choices: 1. generic fix for all vendors: have a lookup table for sleep time of = wait_for_tpm_stat=20 (i.e. TPM_TIMEOUT_WAIT_STAT in my proposed patch)=20 2. quick fix for the regression: change the sleep time of = wait_for_tpm_stat back to 15ms. It is the current proposed patch 3. Fix regression by making exception for ateml chip. =20 Should we reach consensus on which one we want to pursue before dig into = implementation of the patch? In my opinion, I prefer to fix the regression with 2, and = then pursue 1 as long-term solution. 3 is hacky. Let me know what do you guys think Hao =20