From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 35603C433ED for ; Sun, 9 May 2021 06:18:15 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 03217610F7 for ; Sun, 9 May 2021 06:18:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229585AbhEIGTQ (ORCPT ); Sun, 9 May 2021 02:19:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37918 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229453AbhEIGTQ (ORCPT ); Sun, 9 May 2021 02:19:16 -0400 Received: from mail-pj1-x1030.google.com (mail-pj1-x1030.google.com [IPv6:2607:f8b0:4864:20::1030]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AD7DCC061573 for ; Sat, 8 May 2021 23:18:13 -0700 (PDT) Received: by mail-pj1-x1030.google.com with SMTP id gx21-20020a17090b1255b02901589d39576eso1966144pjb.0 for ; Sat, 08 May 2021 23:18:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rubrik.com; s=google; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=W50yuFWHOQrnf5uez859dnrFf7tx3aXkaDMDsUdGsQ8=; b=oTw1llxUrgOBr65VfAGafspaCyeiNwoOjOA99BF1YUEid4N2Fwnu+DzBPcOoQrKR7d FhLAMD4sCsy8i9tIijrOxHGi6ot9ISL33ck5FNeyBhoGiTCi6zJsF7rWkoT0JsaDVRpy sYJOoSgS6fKuxMw1iyVmLHblAFLMWaXLSIOTo= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=W50yuFWHOQrnf5uez859dnrFf7tx3aXkaDMDsUdGsQ8=; b=TWfkJvmzgals0q1ODPXGmsAEcqkifWwGd2Iu5XWT8Y7Ep+CS6ZLAnLTn/CuvNIvlZc gy4PyeAdAoi7m2Ia5G9cWsfnOiwf83PU+cZ7bYyN+X2ADlA5t9aBCWi3GSDH1+ghKVE8 SCaEmvoeOxs2U/nbk3Ru2XfQgAi6Fz2fFcasPEIpeaQZelDUrFDr9pvzscupxsvAx9DS 8veSNDEnEZndl/djxuyrVa9ElhqbBB35ehBYeAIvatxicQumL29kUgeGjL0ucqlxKBYS vROb9JBijskl0heeD6sHxvZkRvKRhe6JReY9WrcQuvx+F1NPsr9rCML0CvM8J9N1msCc nZsQ== X-Gm-Message-State: AOAM533Pyw6YBun/+3pTuQSKZnLJlK4ONo8/gmgPKL+zo7xrRlCxWMXK ITEejWPKixO0WRb9ghG+VN5s1Q== X-Google-Smtp-Source: ABdhPJyR8UkvgQSOIx1ySgxVt9xoJW6KWLBvI3oq1jSeifnyFIt28i9qygyb4orD1kObnijJJqNZnA== X-Received: by 2002:a17:90a:6381:: with SMTP id f1mr10382357pjj.30.1620541092833; Sat, 08 May 2021 23:18:12 -0700 (PDT) Received: from [10.0.0.7] (c-73-231-56-47.hsd1.ca.comcast.net. [73.231.56.47]) by smtp.gmail.com with ESMTPSA id s21sm7993537pjn.29.2021.05.08.23.18.05 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Sat, 08 May 2021 23:18:12 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.60.0.2.21\)) Subject: Re: [PATCH] Fix Atmel TPM crash caused by too frequent queries From: Hao Wu In-Reply-To: <20201118211134.GA5034@linux.intel.com> Date: Sat, 8 May 2021 23:18:02 -0700 Cc: James Bottomley , Nayna , peterhuewe@gmx.de, jgg@ziepe.ca, arnd@arndb.de, gregkh@linuxfoundation.org, Hamza Attak , why2jjj.linux@gmail.com, zohar@linux.vnet.ibm.com, linux-integrity@vger.kernel.org, Paul Menzel , Ken Goldman , Seungyeop Han , Shrihari Kalkar , Anish Jhaveri Content-Transfer-Encoding: quoted-printable Message-Id: References: <6e7b54c268d25a86f8f969bcc01729eaadef6530.camel@HansenPartnership.com> <20201001015051.GA5971@linux.intel.com> <1aed1b0734435959d5e53b8a4b3c18558243e6b8.camel@HansenPartnership.com> <19de5527-2d56-6a07-3ce7-ba216b208090@linux.vnet.ibm.com> <38e165055bae62d4e97f702c05e3a76ccdeeac0f.camel@HansenPartnership.com> <20201001230426.GA26517@linux.intel.com> <20201018050951.GL68722@linux.intel.com> <53B75B06-FD89-4B00-BC3F-46C5B28DC201@rubrik.com> <9E249567-4901-4FA4-BA89-EF6DE51F7E7A@rubrik.com> <20201118211134.GA5034@linux.intel.com> To: Jarkko Sakkinen X-Mailer: Apple Mail (2.3654.60.0.2.21) Precedence: bulk List-ID: X-Mailing-List: linux-integrity@vger.kernel.org > On Nov 18, 2020, at 1:11 PM, Jarkko Sakkinen = wrote: >=20 > On Fri, Nov 13, 2020 at 08:39:28PM -0800, Hao Wu wrote: >>> On Oct 17, 2020, at 10:20 PM, Hao Wu wrote: >>>=20 >>>> On Oct 17, 2020, at 10:09 PM, Jarkko Sakkinen = wrote: >>>>=20 >>>> On Fri, Oct 16, 2020 at 11:11:37PM -0700, Hao Wu wrote: >>>>>> On Oct 1, 2020, at 4:04 PM, Jarkko Sakkinen = wrote: >>>>>>=20 >>>>>> On Thu, Oct 01, 2020 at 11:32:59AM -0700, James Bottomley wrote: >>>>>>> On Thu, 2020-10-01 at 14:15 -0400, Nayna wrote: >>>>>>>> On 10/1/20 12:53 AM, James Bottomley wrote: >>>>>>>>> On Thu, 2020-10-01 at 04:50 +0300, Jarkko Sakkinen wrote: >>>>>>>>>> On Wed, Sep 30, 2020 at 03:31:20PM -0700, James Bottomley = wrote: >>>>>>>>>>> On Thu, 2020-10-01 at 00:09 +0300, Jarkko Sakkinen wrote: >>>>>>> [...] >>>>>>>>>>>> I also wonder if we could adjust the frequency dynamically. >>>>>>>>>>>> I.e. start with optimistic value and lower it until finding >>>>>>>>>>>> the sweet spot. >>>>>>>>>>>=20 >>>>>>>>>>> The problem is the way this crashes: the TPM seems to be >>>>>>>>>>> unrecoverable. If it were recoverable without a hard reset = of >>>>>>>>>>> the entire machine, we could certainly play around with it. = I >>>>>>>>>>> can try alternative mechanisms to see if anything's viable, = but >>>>>>>>>>> to all intents and purposes, it looks like my TPM simply = stops >>>>>>>>>>> responding to the TIS interface. >>>>>>>>>>=20 >>>>>>>>>> A quickly scraped idea probably with some holes in it but I = was >>>>>>>>>> thinking something like >>>>>>>>>>=20 >>>>>>>>>> 1. Initially set slow value for latency, this could be the >>>>>>>>>> original 15 ms. >>>>>>>>>> 2. Use this to read TPM_PT_VENDOR_STRING_*. >>>>>>>>>> 3. Lookup based vendor string from a fixup table a latency = that >>>>>>>>>> works >>>>>>>>>> (the fallback latency could be the existing latency). >>>>>>>>>=20 >>>>>>>>> Well, yes, that was sort of what I was thinking of doing for = the >>>>>>>>> Atmel ... except I was thinking of using the TIS VID (16 byte >>>>>>>>> assigned vendor ID) which means we can get the information to = set >>>>>>>>> the timeout before we have to do any TPM operations. >>>>>>>>=20 >>>>>>>> I wonder if the timeout issue exists for all TPM commands for = the >>>>>>>> same manufacturer. For example, does the ATMEL TPM also crash = when=20 >>>>>>>> extending PCRs ? >>>>>>>>=20 >>>>>>>> In addition to defining a per TPM vendor based lookup table for >>>>>>>> timeout, would it be a good idea to also define a Kconfig/boot = param >>>>>>>> option to allow timeout setting. This will enable to set the = timeout >>>>>>>> based on the specific use. >>>>>>>=20 >>>>>>> I don't think we need go that far (yet). The timing change has = been in >>>>>>> upstream since: >>>>>>>=20 >>>>>>> commit 424eaf910c329ab06ad03a527ef45dcf6a328f00 >>>>>>> Author: Nayna Jain >>>>>>> Date: Wed May 16 01:51:25 2018 -0400 >>>>>>>=20 >>>>>>> tpm: reduce polling time to usecs for even finer granularity >>>>>>>=20 >>>>>>> Which was in the released kernel 4.18: over two years ago. In = all that >>>>>>> time we've discovered two problems: mine which looks to be an = artifact >>>>>>> of an experimental upgrade process in a new nuvoton and the = Atmel.=20 >>>>>>> That means pretty much every other TPM simply works with the = existing >>>>>>> timings >>>>>>>=20 >>>>>>>> I was also thinking how will we decide the lookup table values = for >>>>>>>> each vendor ? >>>>>>>=20 >>>>>>> I wasn't thinking we would. I was thinking I'd do a simple = exception >>>>>>> for the Atmel and nothing else. I don't think my Nuvoton is in = any way >>>>>>> characteristic. Indeed my pluggable TPM rainbow bridge system = works >>>>>>> just fine with a Nuvoton and the current timings. >>>>>>>=20 >>>>>>> We can add additional exceptions if they actually turn up. >>>>>>=20 >>>>>> I'd add a table and fallback. >>>>>>=20 >>>>>=20 >>>>> Hi folks, >>>>>=20 >>>>> I want to follow up this a bit and check whether we reached a = consensus=20 >>>>> on how to fix the timeout issue for Atmel chip. >>>>>=20 >>>>> Should we revert the changes or introduce the lookup table for = chips. >>>>>=20 >>>>> Is there anything I can help from Rubrik side. >>>>>=20 >>>>> Thanks >>>>> Hao >>>>=20 >>>> There is nothing to revert as the previous was not applied but I'm >>>> of course ready to review any new attempts. >>>>=20 >>>=20 >>> Hi Jarkko, >>>=20 >>> By =E2=80=9Crevert=E2=80=9D I meant we revert the timeout value = changes by applying >>> the patch I proposed, as the timeout value discussed does cause = issues. >>>=20 >>> Why don=E2=80=99t we apply the patch and improve the perf in the way = of not >>> breaking TPMs ?=20 >>>=20 >>> Hao >>=20 >> Hi Jarkko and folks, >>=20 >> It=E2=80=99s being a while since our last discussion. I want to push = a fix in the upstream for ateml chip.=20 >> It looks like we currently have following choices: >> 1. generic fix for all vendors: have a lookup table for sleep time = of wait_for_tpm_stat=20 >> (i.e. TPM_TIMEOUT_WAIT_STAT in my proposed patch)=20 >> 2. quick fix for the regression: change the sleep time of = wait_for_tpm_stat back to 15ms. >> It is the current proposed patch >> 3. Fix regression by making exception for ateml chip. =20 >>=20 >> Should we reach consensus on which one we want to pursue before dig >> into implementation of the patch? In my opinion, I prefer to fix the >> regression with 2, and then pursue 1 as long-term solution. 3 is >> hacky. >=20 > What does option 1 fix for *all* vendors? >=20 >> Let me know what do you guys think >>=20 >> Hao >=20 > /Jarkko Hi Jarkko and folks, It has been a while again. In my previous message I answered Jarkko=E2=80=99= s question about the option 1. Jarkko, let me know if it is clear to you or you have further questions = and suggestions on next to do. Somehow I couldn=E2=80=99t found the last message I sent but it is in=20 = https://patchwork.kernel.org/project/linux-integrity/patch/20200926223150.= 109645-1-hao.wu@rubrik.com/ In high-level, the option 1 is to add a timing lookup table for each = manufacture, hence we can configure timing for each chip respectively. Then we don=E2=80=99t need = to worry about fixing ATMEL timing may cause performance degradation for other chips. I do want to push the fix in TPM driver, which is likely to be hit going = forward again when people are doing refactoring without testing chips from all manufacturing. Let me know how should I push this forward. Thanks Hao