From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EE4D51FD7 for ; Tue, 25 Apr 2023 23:34:30 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 9B055C433EF; Tue, 25 Apr 2023 23:34:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1682465670; bh=K1s+noYHRNni1Pq8/RE/Mli4hvd2z61+wK3Y1vv2LyM=; h=Date:Cc:Subject:From:To:References:In-Reply-To:From; b=IvyVDNCDCp2/r5VUoQebr/oJMDXprGJWwwUPukWZJTyggmDDXq2zfulSb8y5dg3S+ ity3qLWMdUtRj2O7fQVWOoUcT6wIbHMuf5Ng9ZXVSeKwxpDcNgPO9nBj9c5fJNsRzQ 1/wPHzv7d4OYTly4E7Vb1DB7MAIbXJ0WkaJVZ5LYE2Ycl+kztWZAYZ8gF1e3OSzYWT 45lUd9QwH9I3gSxrpKHJARTqNcEVu68C50jshkFFkizf0AE5nmIcDTQ0XTurTuV+ou Cf4JFjB++Jc/5fT8ylTV0sTaSUooMn3ENKxf/NVHnCBghFauxj735kaHOEmrMx+t/3 tnwGYgaMPdoTw== Precedence: bulk X-Mailing-List: regressions@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=UTF-8 Date: Wed, 26 Apr 2023 02:34:25 +0300 Message-Id: Cc: "Thorsten Leemhuis" , "James Bottomley" , "Vlastimil Babka" , "Peter Huewe" , "Jason Gunthorpe" , "Jan Dabros" , , "LKML" , , "Dominik Brodowski" , "Herbert Xu" , "Linus Torvalds" , "Johannes Altmanninger" Subject: Re: [REGRESSION] suspend to ram fails in 6.2-rc1 due to tpm errors From: "Jarkko Sakkinen" To: "Jarkko Sakkinen" , "Jason A. Donenfeld" X-Mailer: aerc 0.14.0 References: <7ebab1ff-48f1-2737-f0d3-25c72666d041@leemhuis.info> <4268d0ac-278a-28e4-66d1-e0347f011f46@leemhuis.info> In-Reply-To: On Sun Apr 23, 2023 at 6:34 PM EEST, Jarkko Sakkinen wrote: > On Fri Apr 21, 2023 at 9:27 PM EEST, Jason A. Donenfeld wrote: > > Did you use the patch I sent you and suspend and resume according to > > the instructions I gave you? If not, I don't have much to add. > > Finally, I got it reproduced at my side with TPM 1.2: > > [ 0.379677] tpm_tis 00:00: 1.2 TPM (device-id 0x1, rev-id 1) > [ 32.453447] tpm tpm0: tpm_transmit: tpm_recv: error -5 > [ 33.450601] tpm tpm0: Unable to read header > [ 33.450607] tpm tpm0: tpm_transmit: tpm_recv: error -62 > > I'll look at this further after I've sent v6.3 PR. OK, so this gives the exact tpm_transmit call where it fails: $ sudo bpftrace -e 'kprobe:tpm_transmit { @[kstack] =3D count(); }' [sudo] password for jarkko: Attaching 1 probe... ^C @[ tpm_transmit+1 tpm1_pcr_read+177 tpm1_do_selftest+287 tpm_tis_resume+443 pnp_bus_resume+102 dpm_run_callback+81 device_resume+173 dpm_resume+238 dpm_resume_end+17 suspend_devices_and_enter+473 enter_state+563 pm_suspend+68 state_store+43 kobj_attr_store+15 sysfs_kf_write+59 kernfs_fop_write_iter+304 vfs_write+590 ksys_write+115 __x64_sys_write+25 do_syscall_64+88 entry_SYSCALL_64_after_hwframe+114 ]: 1 @[ tpm_transmit+1 tpm1_do_selftest+179 tpm_tis_resume+443 pnp_bus_resume+102 dpm_run_callback+81 device_resume+173 dpm_resume+238 dpm_resume_end+17 suspend_devices_and_enter+473 enter_state+563 pm_suspend+68 state_store+43 kobj_attr_store+15 sysfs_kf_write+59 kernfs_fop_write_iter+304 vfs_write+590 ksys_write+115 __x64_sys_write+25 do_syscall_64+88 entry_SYSCALL_64_after_hwframe+114 ]: 1 @[ tpm_transmit+1 tpm1_pm_suspend+203 tpm_pm_suspend+131 __pnp_bus_suspend+65 pnp_bus_suspend+19 dpm_run_callback+81 __device_suspend+329 dpm_suspend+432 dpm_suspend_start+155 suspend_devices_and_enter+370 enter_state+563 pm_suspend+68 state_store+43 kobj_attr_store+15 sysfs_kf_write+59 kernfs_fop_write_iter+304 vfs_write+590 ksys_write+115 __x64_sys_write+25 do_syscall_64+88 entry_SYSCALL_64_after_hwframe+114 ]: 1 @[ tpm_transmit+1 tpm1_get_random+206 tpm_get_random+70 tpm_hwrng_read+21 hwrng_fillfn+234 kthread+230 ret_from_fork+41 ]: 75897 So it is the very first PCR read in tpm1_do_selftest. There is a bug at plain sight in tpm1_tis_resume(): before tpm_tis_resume() calls tpm1_do_selftest(), it only requests and relinquishes locality. This is not sufficient: it should also disable clkrun protocol. tpm1_do_selftest() is called also during the driver initialization successfully, the difference being that clkrun protocol is disabled. I'm compiling now a kernel with a test fix that calls tpm_chip_start() and tpm_chip_stop() as a substitute for request/relinquish locality. These should be used anyway instead of ad-hoc code. BR, Jarkko BR, Jarkko