* [PATCH] tpm: Revert "tpm: fix invalid locking in NONBLOCKING mode" @ 2020-05-26 18:32 Mario Limonciello 2020-05-26 19:14 ` James Bottomley 0 siblings, 1 reply; 20+ messages in thread From: Mario Limonciello @ 2020-05-26 18:32 UTC (permalink / raw) To: Peter Huewe, Jarkko Sakkinen, Jason Gunthorpe Cc: Arnd Bergmann, Greg Kroah-Hartman, linux-integrity, linux-kernel, Mario Limonciello, Jeffrin Jose T, Alex Guzman This reverts commit d23d12484307b40eea549b8a858f5fffad913897. This commit has caused regressions for the XPS 9560 containing a Nuvoton TPM. As mentioned by the reporter all TPM2 commands are failing with: ERROR:tcti:src/tss2-tcti/tcti-device.c:290:tcti_device_receive() Failed to read response from fd 3, got errno 1: Operation not permitted The reporter bisected this issue back to this commit which was backported to stable as commit 4d6ebc4. Cc: Jeffrin Jose T <jeffrin@rajagiritech.edu.in> BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=206275 Fixes: d23d124 ("tpm: fix invalid locking in NONBLOCKING mode") Reported-by: Alex Guzman <alex@guzman.io> Signed-off-by: Mario Limonciello <mario.limonciello@dell.com> --- drivers/char/tpm/tpm-dev-common.c | 8 -------- 1 file changed, 8 deletions(-) diff --git a/drivers/char/tpm/tpm-dev-common.c b/drivers/char/tpm/tpm-dev-common.c index 87f449340202..bc9d7c7ddc01 100644 --- a/drivers/char/tpm/tpm-dev-common.c +++ b/drivers/char/tpm/tpm-dev-common.c @@ -61,12 +61,6 @@ static void tpm_dev_async_work(struct work_struct *work) mutex_lock(&priv->buffer_mutex); priv->command_enqueued = false; - ret = tpm_try_get_ops(priv->chip); - if (ret) { - priv->response_length = ret; - goto out; - } - ret = tpm_dev_transmit(priv->chip, priv->space, priv->data_buffer, sizeof(priv->data_buffer)); tpm_put_ops(priv->chip); @@ -74,7 +68,6 @@ static void tpm_dev_async_work(struct work_struct *work) priv->response_length = ret; mod_timer(&priv->user_read_timer, jiffies + (120 * HZ)); } -out: mutex_unlock(&priv->buffer_mutex); wake_up_interruptible(&priv->async_wait); } @@ -211,7 +204,6 @@ ssize_t tpm_common_write(struct file *file, const char __user *buf, if (file->f_flags & O_NONBLOCK) { priv->command_enqueued = true; queue_work(tpm_dev_wq, &priv->async_work); - tpm_put_ops(priv->chip); mutex_unlock(&priv->buffer_mutex); return size; } -- 2.25.1 ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH] tpm: Revert "tpm: fix invalid locking in NONBLOCKING mode" 2020-05-26 18:32 [PATCH] tpm: Revert "tpm: fix invalid locking in NONBLOCKING mode" Mario Limonciello @ 2020-05-26 19:14 ` James Bottomley 2020-05-26 19:23 ` Mario.Limonciello 2020-05-26 19:39 ` Tadeusz Struk 0 siblings, 2 replies; 20+ messages in thread From: James Bottomley @ 2020-05-26 19:14 UTC (permalink / raw) To: Mario Limonciello, Peter Huewe, Jarkko Sakkinen, Jason Gunthorpe Cc: Arnd Bergmann, Greg Kroah-Hartman, linux-integrity, linux-kernel, Jeffrin Jose T, Alex Guzman On Tue, 2020-05-26 at 13:32 -0500, Mario Limonciello wrote: > This reverts commit d23d12484307b40eea549b8a858f5fffad913897. > > This commit has caused regressions for the XPS 9560 containing > a Nuvoton TPM. Presumably this is using the tis driver? > As mentioned by the reporter all TPM2 commands are failing with: > ERROR:tcti:src/tss2-tcti/tcti-device.c:290:tcti_device_receive() > Failed to read response from fd 3, got errno 1: Operation not > permitted > > The reporter bisected this issue back to this commit which was > backported to stable as commit 4d6ebc4. I think the problem is request_locality ... for some inexplicable reason a failure there returns -1, which is EPERM to user space. That seems to be a bug in the async code since everything else gives a ESPIPE error if tpm_try_get_ops fails ... at least no-one assumes it gives back a sensible return code. What I think is happening is that with the patch the TPM goes through a quick sequence of request, relinquish, request, relinquish and it's the third request which is failing (likely timing out). Without the patch, the patch there's only one request,relinquish cycle because the ops are held while the async work is executed. I have a vague recollection that there is a problem with too many locality request in quick succession, but I'll defer to Jason, who I think understands the intricacies of localities better than I do. If that's the problem, the solution looks simple enough: just move the ops get down because the priv state is already protected by the buffer mutex James --- diff --git a/drivers/char/tpm/tpm-dev-common.c b/drivers/char/tpm/tpm-dev-common.c index 87f449340202..1784530b8387 100644 --- a/drivers/char/tpm/tpm-dev-common.c +++ b/drivers/char/tpm/tpm-dev-common.c @@ -189,15 +189,6 @@ ssize_t tpm_common_write(struct file *file, const char __user *buf, goto out; } - /* atomic tpm command send and result receive. We only hold the ops - * lock during this period so that the tpm can be unregistered even if - * the char dev is held open. - */ - if (tpm_try_get_ops(priv->chip)) { - ret = -EPIPE; - goto out; - } - priv->response_length = 0; priv->response_read = false; *off = 0; @@ -211,11 +202,19 @@ ssize_t tpm_common_write(struct file *file, const char __user *buf, if (file->f_flags & O_NONBLOCK) { priv->command_enqueued = true; queue_work(tpm_dev_wq, &priv->async_work); - tpm_put_ops(priv->chip); mutex_unlock(&priv->buffer_mutex); return size; } + /* atomic tpm command send and result receive. We only hold the ops + * lock during this period so that the tpm can be unregistered even if + * the char dev is held open. + */ + if (tpm_try_get_ops(priv->chip)) { + ret = -EPIPE; + goto out; + } + ret = tpm_dev_transmit(priv->chip, priv->space, priv->data_buffer, sizeof(priv->data_buffer)); tpm_put_ops(priv->chip); ^ permalink raw reply related [flat|nested] 20+ messages in thread
* RE: [PATCH] tpm: Revert "tpm: fix invalid locking in NONBLOCKING mode" 2020-05-26 19:14 ` James Bottomley @ 2020-05-26 19:23 ` Mario.Limonciello 2020-05-26 19:38 ` James Bottomley 2020-05-26 19:39 ` Tadeusz Struk 1 sibling, 1 reply; 20+ messages in thread From: Mario.Limonciello @ 2020-05-26 19:23 UTC (permalink / raw) To: James.Bottomley, peterhuewe, jarkko.sakkinen, jgg Cc: arnd, gregkh, linux-integrity, linux-kernel, jeffrin, alex > On Tue, 2020-05-26 at 13:32 -0500, Mario Limonciello wrote: > > This reverts commit d23d12484307b40eea549b8a858f5fffad913897. > > > > This commit has caused regressions for the XPS 9560 containing > > a Nuvoton TPM. > > Presumably this is using the tis driver? Correct. > > > As mentioned by the reporter all TPM2 commands are failing with: > > ERROR:tcti:src/tss2-tcti/tcti-device.c:290:tcti_device_receive() > > Failed to read response from fd 3, got errno 1: Operation not > > permitted > > > > The reporter bisected this issue back to this commit which was > > backported to stable as commit 4d6ebc4. > > I think the problem is request_locality ... for some inexplicable > reason a failure there returns -1, which is EPERM to user space. > > That seems to be a bug in the async code since everything else gives a > ESPIPE error if tpm_try_get_ops fails ... at least no-one assumes it > gives back a sensible return code. > > What I think is happening is that with the patch the TPM goes through a > quick sequence of request, relinquish, request, relinquish and it's the > third request which is failing (likely timing out). Without the patch, > the patch there's only one request,relinquish cycle because the ops are > held while the async work is executed. I have a vague recollection > that there is a problem with too many locality request in quick > succession, but I'll defer to Jason, who I think understands the > intricacies of localities better than I do. Thanks, I don't pretend to understand the nuances of this particular code, but I was hoping that the request to revert got some attention since Alex's kernel Bugzilla and message a few months ago to linux integrity weren't. > > If that's the problem, the solution looks simple enough: just move the > ops get down because the priv state is already protected by the buffer > mutex Yeah, if that works for Alex's situation it certainly sounds like a better solution than reverting this patch as this patch actually does fix a problem reported by Jeffrin originally. Could you propose a specific patch that Alex and Jeffrin can perhaps both try? > > James > > --- > > diff --git a/drivers/char/tpm/tpm-dev-common.c b/drivers/char/tpm/tpm-dev- > common.c > index 87f449340202..1784530b8387 100644 > --- a/drivers/char/tpm/tpm-dev-common.c > +++ b/drivers/char/tpm/tpm-dev-common.c > @@ -189,15 +189,6 @@ ssize_t tpm_common_write(struct file *file, const char > __user *buf, > goto out; > } > > - /* atomic tpm command send and result receive. We only hold the ops > - * lock during this period so that the tpm can be unregistered even if > - * the char dev is held open. > - */ > - if (tpm_try_get_ops(priv->chip)) { > - ret = -EPIPE; > - goto out; > - } > - > priv->response_length = 0; > priv->response_read = false; > *off = 0; > @@ -211,11 +202,19 @@ ssize_t tpm_common_write(struct file *file, const char > __user *buf, > if (file->f_flags & O_NONBLOCK) { > priv->command_enqueued = true; > queue_work(tpm_dev_wq, &priv->async_work); > - tpm_put_ops(priv->chip); > mutex_unlock(&priv->buffer_mutex); > return size; > } > > + /* atomic tpm command send and result receive. We only hold the ops > + * lock during this period so that the tpm can be unregistered even if > + * the char dev is held open. > + */ > + if (tpm_try_get_ops(priv->chip)) { > + ret = -EPIPE; > + goto out; > + } > + > ret = tpm_dev_transmit(priv->chip, priv->space, priv->data_buffer, > sizeof(priv->data_buffer)); > tpm_put_ops(priv->chip); ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] tpm: Revert "tpm: fix invalid locking in NONBLOCKING mode" 2020-05-26 19:23 ` Mario.Limonciello @ 2020-05-26 19:38 ` James Bottomley 2020-05-26 22:19 ` Alex Guzman 2020-05-27 20:09 ` Jarkko Sakkinen 0 siblings, 2 replies; 20+ messages in thread From: James Bottomley @ 2020-05-26 19:38 UTC (permalink / raw) To: Mario.Limonciello, peterhuewe, jarkko.sakkinen, jgg Cc: arnd, gregkh, linux-integrity, linux-kernel, jeffrin, alex On Tue, 2020-05-26 at 19:23 +0000, Mario.Limonciello@dell.com wrote: > > On Tue, 2020-05-26 at 13:32 -0500, Mario Limonciello wrote: > > > This reverts commit d23d12484307b40eea549b8a858f5fffad913897. > > > > > > This commit has caused regressions for the XPS 9560 containing > > > a Nuvoton TPM. > > > > Presumably this is using the tis driver? > > Correct. > > > > > > As mentioned by the reporter all TPM2 commands are failing with: > > > ERROR:tcti:src/tss2-tcti/tcti- > > > device.c:290:tcti_device_receive() > > > Failed to read response from fd 3, got errno 1: Operation not > > > permitted > > > > > > The reporter bisected this issue back to this commit which was > > > backported to stable as commit 4d6ebc4. > > > > I think the problem is request_locality ... for some inexplicable > > reason a failure there returns -1, which is EPERM to user space. > > > > That seems to be a bug in the async code since everything else > > gives a ESPIPE error if tpm_try_get_ops fails ... at least no-one > > assumes it gives back a sensible return code. > > > > What I think is happening is that with the patch the TPM goes > > through a quick sequence of request, relinquish, request, > > relinquish and it's the third request which is failing (likely > > timing out). Without the patch, the patch there's only one > > request,relinquish cycle because the ops are held while the async > > work is executed. I have a vague recollection that there is a > > problem with too many locality request in quick succession, but > > I'll defer to Jason, who I think understands the intricacies of > > localities better than I do. > > Thanks, I don't pretend to understand the nuances of this particular > code, but I was hoping that the request to revert got some attention > since Alex's kernel Bugzilla and message a few months ago to linux > integrity weren't. > > > > > If that's the problem, the solution looks simple enough: just move > > the ops get down because the priv state is already protected by the > > buffer mutex > > Yeah, if that works for Alex's situation it certainly sounds like a > better solution than reverting this patch as this patch actually does > fix a problem reported by Jeffrin originally. > > Could you propose a specific patch that Alex and Jeffrin can perhaps > both try? Um, what's wrong with the one I originally attached and which you quote below? It's only compile tested, but I think it will work, if the theory is correct. James > > > > James > > > > --- > > > > diff --git a/drivers/char/tpm/tpm-dev-common.c > > b/drivers/char/tpm/tpm-dev- > > common.c > > index 87f449340202..1784530b8387 100644 > > --- a/drivers/char/tpm/tpm-dev-common.c > > +++ b/drivers/char/tpm/tpm-dev-common.c > > @@ -189,15 +189,6 @@ ssize_t tpm_common_write(struct file *file, > > const char > > __user *buf, > > goto out; > > } > > > > - /* atomic tpm command send and result receive. We only > > hold the ops > > - * lock during this period so that the tpm can be > > unregistered even if > > - * the char dev is held open. > > - */ > > - if (tpm_try_get_ops(priv->chip)) { > > - ret = -EPIPE; > > - goto out; > > - } > > - > > priv->response_length = 0; > > priv->response_read = false; > > *off = 0; > > @@ -211,11 +202,19 @@ ssize_t tpm_common_write(struct file *file, > > const char > > __user *buf, > > if (file->f_flags & O_NONBLOCK) { > > priv->command_enqueued = true; > > queue_work(tpm_dev_wq, &priv->async_work); > > - tpm_put_ops(priv->chip); > > mutex_unlock(&priv->buffer_mutex); > > return size; > > } > > > > + /* atomic tpm command send and result receive. We only > > hold the ops > > + * lock during this period so that the tpm can be > > unregistered even if > > + * the char dev is held open. > > + */ > > + if (tpm_try_get_ops(priv->chip)) { > > + ret = -EPIPE; > > + goto out; > > + } > > + > > ret = tpm_dev_transmit(priv->chip, priv->space, priv- > > > data_buffer, > > > > sizeof(priv->data_buffer)); > > tpm_put_ops(priv->chip); ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] tpm: Revert "tpm: fix invalid locking in NONBLOCKING mode" 2020-05-26 19:38 ` James Bottomley @ 2020-05-26 22:19 ` Alex Guzman 2020-05-26 23:06 ` James Bottomley 2020-05-27 20:09 ` Jarkko Sakkinen 1 sibling, 1 reply; 20+ messages in thread From: Alex Guzman @ 2020-05-26 22:19 UTC (permalink / raw) To: James Bottomley, Mario.Limonciello, peterhuewe, jarkko.sakkinen, jgg Cc: arnd, gregkh, linux-integrity, linux-kernel, jeffrin [-- Attachment #1: Type: text/plain, Size: 4462 bytes --] On Tue, 2020-05-26 at 12:38 -0700, James Bottomley wrote: > On Tue, 2020-05-26 at 19:23 +0000, Mario.Limonciello@dell.com wrote: > > > On Tue, 2020-05-26 at 13:32 -0500, Mario Limonciello wrote: > > > > This reverts commit d23d12484307b40eea549b8a858f5fffad913897. > > > > > > > > This commit has caused regressions for the XPS 9560 containing > > > > a Nuvoton TPM. > > > > > > Presumably this is using the tis driver? > > > > Correct. > > > > > > As mentioned by the reporter all TPM2 commands are failing > > > > with: > > > > ERROR:tcti:src/tss2-tcti/tcti- > > > > device.c:290:tcti_device_receive() > > > > Failed to read response from fd 3, got errno 1: Operation not > > > > permitted > > > > > > > > The reporter bisected this issue back to this commit which was > > > > backported to stable as commit 4d6ebc4. > > > > > > I think the problem is request_locality ... for some inexplicable > > > reason a failure there returns -1, which is EPERM to user space. > > > > > > That seems to be a bug in the async code since everything else > > > gives a ESPIPE error if tpm_try_get_ops fails ... at least no-one > > > assumes it gives back a sensible return code. > > > > > > What I think is happening is that with the patch the TPM goes > > > through a quick sequence of request, relinquish, request, > > > relinquish and it's the third request which is failing (likely > > > timing out). Without the patch, the patch there's only one > > > request,relinquish cycle because the ops are held while the async > > > work is executed. I have a vague recollection that there is a > > > problem with too many locality request in quick succession, but > > > I'll defer to Jason, who I think understands the intricacies of > > > localities better than I do. > > > > Thanks, I don't pretend to understand the nuances of this > > particular > > code, but I was hoping that the request to revert got some > > attention > > since Alex's kernel Bugzilla and message a few months ago to linux > > integrity weren't. > > > > > If that's the problem, the solution looks simple enough: just > > > move > > > the ops get down because the priv state is already protected by > > > the > > > buffer mutex > > > > Yeah, if that works for Alex's situation it certainly sounds like a > > better solution than reverting this patch as this patch actually > > does > > fix a problem reported by Jeffrin originally. > > > > Could you propose a specific patch that Alex and Jeffrin can > > perhaps > > both try? > > Um, what's wrong with the one I originally attached and which you > quote > below? It's only compile tested, but I think it will work, if the > theory is correct. > > James > > > > James > > > > > > --- > > > > > > diff --git a/drivers/char/tpm/tpm-dev-common.c > > > b/drivers/char/tpm/tpm-dev- > > > common.c > > > index 87f449340202..1784530b8387 100644 > > > --- a/drivers/char/tpm/tpm-dev-common.c > > > +++ b/drivers/char/tpm/tpm-dev-common.c > > > @@ -189,15 +189,6 @@ ssize_t tpm_common_write(struct file *file, > > > const char > > > __user *buf, > > > goto out; > > > } > > > > > > - /* atomic tpm command send and result receive. We only > > > hold the ops > > > - * lock during this period so that the tpm can be > > > unregistered even if > > > - * the char dev is held open. > > > - */ > > > - if (tpm_try_get_ops(priv->chip)) { > > > - ret = -EPIPE; > > > - goto out; > > > - } > > > - > > > priv->response_length = 0; > > > priv->response_read = false; > > > *off = 0; > > > @@ -211,11 +202,19 @@ ssize_t tpm_common_write(struct file *file, > > > const char > > > __user *buf, > > > if (file->f_flags & O_NONBLOCK) { > > > priv->command_enqueued = true; > > > queue_work(tpm_dev_wq, &priv->async_work); > > > - tpm_put_ops(priv->chip); > > > mutex_unlock(&priv->buffer_mutex); > > > return size; > > > } > > > > > > + /* atomic tpm command send and result receive. We only > > > hold the ops > > > + * lock during this period so that the tpm can be > > > unregistered even if > > > + * the char dev is held open. > > > + */ > > > + if (tpm_try_get_ops(priv->chip)) { > > > + ret = -EPIPE; > > > + goto out; > > > + } > > > + > > > ret = tpm_dev_transmit(priv->chip, priv->space, priv- > > > > data_buffer, > > > > > > sizeof(priv->data_buffer)); > > > tpm_put_ops(priv->chip); When using your patch, I get a hang when trying to use tpm2_getcap, and dmesg shows some info. [-- Attachment #2: buglog.txt --] [-- Type: text/plain, Size: 3761 bytes --] [ 570.913779] BUG: unable to handle page fault for address: ffffb20001247000 [ 570.913782] #PF: supervisor write access in kernel mode [ 570.913783] #PF: error_code(0x0002) - not-present page [ 570.913784] PGD 0 P4D 0 [ 570.913785] Oops: 0002 [#3] SMP PTI [ 570.913787] CPU: 6 PID: 24744 Comm: tpm2_getcap Tainted: G UD 5.7.0-rc7+ #31 [ 570.913788] Hardware name: Dell Inc. XPS 15 9560/05FFDN, BIOS 1.18.0 11/17/2019 [ 570.913791] RIP: 0010:iowrite8+0x9/0x50 [ 570.913792] Code: 48 c7 c2 40 43 9f 99 48 89 04 24 e8 14 a7 90 ff 0f 0b 48 8b 04 24 48 83 c4 08 c3 66 0f 1f 44 00 00 48 81 fe ff ff 03 00 76 04 <40> 88 3e c3 48 81 fe 00 00 01 00 76 07 0f b7 d6 89 f8 ee c3 8b 05 [ 570.913793] RSP: 0018:ffffb1ff049d7db0 EFLAGS: 00010292 [ 570.913794] RAX: ffffffff981bf520 RBX: ffffb1ff049d7df9 RCX: ffffb1ff049d7df8 [ 570.913795] RDX: 0000000000000001 RSI: ffffb20001247000 RDI: 0000000000000020 [ 570.913796] RBP: ffffb1ff049d7df9 R08: 0000000000000000 R09: ffff8b80de5370f0 [ 570.913797] R10: 0000000000b71b00 R11: 000000000000028f R12: ffff8b80b148cda8 [ 570.913797] R13: 00000000fffff000 R14: ffff8b80b148cda8 R15: ffff8b80cb44a0ba [ 570.913799] FS: 00007f78f7cd0d80(0000) GS:ffff8b80de500000(0000) knlGS:0000000000000000 [ 570.913799] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 570.913800] CR2: ffffb20001247000 CR3: 0000000795618001 CR4: 00000000003606e0 [ 570.913801] Call Trace: [ 570.913803] tpm_tcg_write_bytes+0x2f/0x40 [ 570.913805] release_locality+0x49/0x220 [ 570.913807] tpm_relinquish_locality+0x1f/0x40 [ 570.913808] tpm_chip_stop+0x21/0x40 [ 570.913810] tpm_put_ops+0x9/0x30 [ 570.913811] tpm_common_write+0x179/0x190 [ 570.913813] vfs_write+0xb1/0x1a0 [ 570.913815] ksys_write+0x5a/0xd0 [ 570.913816] do_syscall_64+0x43/0x130 [ 570.913819] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 570.913820] RIP: 0033:0x7f78f7e00123 [ 570.913821] Code: 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 64 8b 04 25 18 00 00 00 85 c0 75 14 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 55 c3 0f 1f 40 00 48 83 ec 28 48 89 54 24 18 [ 570.913822] RSP: 002b:00007fff724e8c68 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 570.913823] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f78f7e00123 [ 570.913824] RDX: 0000000000000016 RSI: 0000564cf24a7220 RDI: 0000000000000003 [ 570.913825] RBP: 0000000000000016 R08: 00007f78f7ccc785 R09: 00007f78f7ccca40 [ 570.913826] R10: 00007fff724e8b10 R11: 0000000000000246 R12: 0000564cf24a7220 [ 570.913826] R13: 0000000000000000 R14: 0000000000000016 R15: 00007f78f7ccc890 [ 570.913827] Modules linked in: squashfs rtsx_pci_sdmmc x86_pkg_temp_thermal coretemp rtsx_pci mfd_core [ 570.913831] CR2: ffffb20001247000 [ 570.913832] ---[ end trace c84437b00f0d01a0 ]--- [ 570.913833] RIP: 0010:iowrite8+0x9/0x50 [ 570.913834] Code: 48 c7 c2 40 43 9f 99 48 89 04 24 e8 14 a7 90 ff 0f 0b 48 8b 04 24 48 83 c4 08 c3 66 0f 1f 44 00 00 48 81 fe ff ff 03 00 76 04 <40> 88 3e c3 48 81 fe 00 00 01 00 76 07 0f b7 d6 89 f8 ee c3 8b 05 [ 570.913835] RSP: 0018:ffffb1ff030b7db0 EFLAGS: 00010292 [ 570.913836] RAX: ffffffff981bf520 RBX: ffffb1ff030b7df9 RCX: ffffb1ff030b7df8 [ 570.913837] RDX: 0000000000000001 RSI: ffffb20001247000 RDI: 0000000000000020 [ 570.913837] RBP: ffffb1ff030b7df9 R08: 0000000000000000 R09: ffff8b80de2370f0 [ 570.913838] R10: 0000000000b71b00 R11: 000000000000019c R12: ffff8b80b148cda8 [ 570.913839] R13: 00000000fffff000 R14: ffff8b80b148cda8 R15: ffff8b80c4cfc0ba [ 570.913840] FS: 00007f78f7cd0d80(0000) GS:ffff8b80de500000(0000) knlGS:0000000000000000 [ 570.913840] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 570.913841] CR2: ffffb20001247000 CR3: 0000000795618001 CR4: 00000000003606e0 ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] tpm: Revert "tpm: fix invalid locking in NONBLOCKING mode" 2020-05-26 22:19 ` Alex Guzman @ 2020-05-26 23:06 ` James Bottomley 2020-05-26 23:31 ` Alex Guzman 0 siblings, 1 reply; 20+ messages in thread From: James Bottomley @ 2020-05-26 23:06 UTC (permalink / raw) To: Alex Guzman, Mario.Limonciello, peterhuewe, jarkko.sakkinen, jgg Cc: arnd, gregkh, linux-integrity, linux-kernel, jeffrin On Tue, 2020-05-26 at 15:19 -0700, Alex Guzman wrote: [...] > When using your patch, I get a hang when trying to use tpm2_getcap, > and dmesg shows some info. Are you sure it's all applied? This > [ 570.913803] tpm_tcg_write_bytes+0x2f/0x40 > [ 570.913805] release_locality+0x49/0x220 > [ 570.913807] tpm_relinquish_locality+0x1f/0x40 > [ 570.913808] tpm_chip_stop+0x21/0x40 > [ 570.913810] tpm_put_ops+0x9/0x30 > [ 570.913811] tpm_common_write+0x179/0x190 > [ 570.913813] vfs_write+0xb1/0x1a0 Implies an unmatched tpm_put_ops() in the async write path, as though this hunk: > @@ -211,11 +202,19 @@ ssize_t tpm_common_write(struct file *file, > const char __user *buf, > if (file->f_flags & O_NONBLOCK) { > priv->command_enqueued = true; > queue_work(tpm_dev_wq, &priv->async_work); > - tpm_put_ops(priv->chip); > mutex_unlock(&priv->buffer_mutex); > return size; > } Is missing. I actually booted the patch in my TPM based VM and it all seems to work OK when I execute tpm2_getcap (I verified it's using O_NONBLOCK) and tssgetcapability in sync mode. James ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] tpm: Revert "tpm: fix invalid locking in NONBLOCKING mode" 2020-05-26 23:06 ` James Bottomley @ 2020-05-26 23:31 ` Alex Guzman 2020-05-27 0:18 ` James Bottomley 0 siblings, 1 reply; 20+ messages in thread From: Alex Guzman @ 2020-05-26 23:31 UTC (permalink / raw) To: James Bottomley Cc: Mario Limonciello, Peter Huewe, Jarkko Sakkinen, Jason Gunthorpe, Arnd Bergmann, Greg Kroah-Hartman, linux-integrity, linux-kernel, Jeffrin Jose T On Tue, May 26, 2020 at 4:06 PM James Bottomley <James.Bottomley@hansenpartnership.com> wrote: > > On Tue, 2020-05-26 at 15:19 -0700, Alex Guzman wrote: > [...] > > When using your patch, I get a hang when trying to use tpm2_getcap, > > and dmesg shows some info. > > Are you sure it's all applied? This > > > [ 570.913803] tpm_tcg_write_bytes+0x2f/0x40 > > [ 570.913805] release_locality+0x49/0x220 > > [ 570.913807] tpm_relinquish_locality+0x1f/0x40 > > [ 570.913808] tpm_chip_stop+0x21/0x40 > > [ 570.913810] tpm_put_ops+0x9/0x30 > > [ 570.913811] tpm_common_write+0x179/0x190 > > [ 570.913813] vfs_write+0xb1/0x1a0 > > Implies an unmatched tpm_put_ops() in the async write path, as though > this hunk: > > > @@ -211,11 +202,19 @@ ssize_t tpm_common_write(struct file *file, > > const char __user *buf, > > if (file->f_flags & O_NONBLOCK) { > > priv->command_enqueued = true; > > queue_work(tpm_dev_wq, &priv->async_work); > > - tpm_put_ops(priv->chip); > > mutex_unlock(&priv->buffer_mutex); > > return size; > > } > > Is missing. I actually booted the patch in my TPM based VM and it all > seems to work OK when I execute tpm2_getcap (I verified it's using > O_NONBLOCK) and tssgetcapability in sync mode. > > James > Oh, I did miss that bit. The patch had issues applying for some reason and I missed the single-line removal when I was looking at the diff. I gave it a spin on my machine again. getcap seems to work correctly with and without having the async config flag set for tpm2-tss. The pkcs11 plugin seems to work correctly again too. :) ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] tpm: Revert "tpm: fix invalid locking in NONBLOCKING mode" 2020-05-26 23:31 ` Alex Guzman @ 2020-05-27 0:18 ` James Bottomley 0 siblings, 0 replies; 20+ messages in thread From: James Bottomley @ 2020-05-27 0:18 UTC (permalink / raw) To: Alex Guzman Cc: Mario Limonciello, Peter Huewe, Jarkko Sakkinen, Jason Gunthorpe, Arnd Bergmann, Greg Kroah-Hartman, linux-integrity, linux-kernel, Jeffrin Jose T On Tue, 2020-05-26 at 16:31 -0700, Alex Guzman wrote: > On Tue, May 26, 2020 at 4:06 PM James Bottomley > <James.Bottomley@hansenpartnership.com> wrote: > > > > On Tue, 2020-05-26 at 15:19 -0700, Alex Guzman wrote: > > [...] > > > When using your patch, I get a hang when trying to use > > > tpm2_getcap, and dmesg shows some info. > > > > Are you sure it's all applied? This > > > > > [ 570.913803] tpm_tcg_write_bytes+0x2f/0x40 > > > [ 570.913805] release_locality+0x49/0x220 > > > [ 570.913807] tpm_relinquish_locality+0x1f/0x40 > > > [ 570.913808] tpm_chip_stop+0x21/0x40 > > > [ 570.913810] tpm_put_ops+0x9/0x30 > > > [ 570.913811] tpm_common_write+0x179/0x190 > > > [ 570.913813] vfs_write+0xb1/0x1a0 > > > > Implies an unmatched tpm_put_ops() in the async write path, as > > though this hunk: > > > > > @@ -211,11 +202,19 @@ ssize_t tpm_common_write(struct file *file, > > > const char __user *buf, > > > if (file->f_flags & O_NONBLOCK) { > > > priv->command_enqueued = true; > > > queue_work(tpm_dev_wq, &priv->async_work); > > > - tpm_put_ops(priv->chip); > > > mutex_unlock(&priv->buffer_mutex); > > > return size; > > > } > > > > Is missing. I actually booted the patch in my TPM based VM and it > > all seems to work OK when I execute tpm2_getcap (I verified it's > > using O_NONBLOCK) and tssgetcapability in sync mode. > > > > James > > > > Oh, I did miss that bit. The patch had issues applying for some > reason and I missed the single-line removal when I was looking at the > diff. Sorry, that's likely my fault: I did it on top of my current TPM tree. I'll prepare a version against the vanilla kernel with a real changelog. > I gave it a spin on my machine again. getcap seems to work correctly > with and without having the async config flag set for tpm2-tss. The > pkcs11 plugin seems to work correctly again too. :) Great, thanks! I'll add your tested-by to the above. James ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] tpm: Revert "tpm: fix invalid locking in NONBLOCKING mode" 2020-05-26 19:38 ` James Bottomley 2020-05-26 22:19 ` Alex Guzman @ 2020-05-27 20:09 ` Jarkko Sakkinen 2020-05-27 20:18 ` Mario.Limonciello 1 sibling, 1 reply; 20+ messages in thread From: Jarkko Sakkinen @ 2020-05-27 20:09 UTC (permalink / raw) To: James Bottomley, Mario.Limonciello, peterhuewe, jgg Cc: arnd, gregkh, linux-integrity, linux-kernel, jeffrin, alex On Tue, 2020-05-26 at 12:38 -0700, James Bottomley wrote: > On Tue, 2020-05-26 at 19:23 +0000, Mario.Limonciello@dell.com wrote: > > > On Tue, 2020-05-26 at 13:32 -0500, Mario Limonciello wrote: > > > > This reverts commit d23d12484307b40eea549b8a858f5fffad913897. > > > > > > > > This commit has caused regressions for the XPS 9560 containing > > > > a Nuvoton TPM. > > > > > > Presumably this is using the tis driver? > > > > Correct. > > > > > > As mentioned by the reporter all TPM2 commands are failing with: > > > > ERROR:tcti:src/tss2-tcti/tcti- > > > > device.c:290:tcti_device_receive() > > > > Failed to read response from fd 3, got errno 1: Operation not > > > > permitted > > > > > > > > The reporter bisected this issue back to this commit which was > > > > backported to stable as commit 4d6ebc4. > > > > > > I think the problem is request_locality ... for some inexplicable > > > reason a failure there returns -1, which is EPERM to user space. > > > > > > That seems to be a bug in the async code since everything else > > > gives a ESPIPE error if tpm_try_get_ops fails ... at least no-one > > > assumes it gives back a sensible return code. > > > > > > What I think is happening is that with the patch the TPM goes > > > through a quick sequence of request, relinquish, request, > > > relinquish and it's the third request which is failing (likely > > > timing out). Without the patch, the patch there's only one > > > request,relinquish cycle because the ops are held while the async > > > work is executed. I have a vague recollection that there is a > > > problem with too many locality request in quick succession, but > > > I'll defer to Jason, who I think understands the intricacies of > > > localities better than I do. > > > > Thanks, I don't pretend to understand the nuances of this particular > > code, but I was hoping that the request to revert got some attention > > since Alex's kernel Bugzilla and message a few months ago to linux > > integrity weren't. > > > > > If that's the problem, the solution looks simple enough: just move > > > the ops get down because the priv state is already protected by the > > > buffer mutex > > > > Yeah, if that works for Alex's situation it certainly sounds like a > > better solution than reverting this patch as this patch actually does > > fix a problem reported by Jeffrin originally. > > > > Could you propose a specific patch that Alex and Jeffrin can perhaps > > both try? > > Um, what's wrong with the one I originally attached and which you quote > below? It's only compile tested, but I think it will work, if the > theory is correct. Please send a legit patch, thanks. /Jarkko ^ permalink raw reply [flat|nested] 20+ messages in thread
* RE: [PATCH] tpm: Revert "tpm: fix invalid locking in NONBLOCKING mode" 2020-05-27 20:09 ` Jarkko Sakkinen @ 2020-05-27 20:18 ` Mario.Limonciello 2020-05-28 0:43 ` Jarkko Sakkinen 0 siblings, 1 reply; 20+ messages in thread From: Mario.Limonciello @ 2020-05-27 20:18 UTC (permalink / raw) To: jarkko.sakkinen, James.Bottomley, peterhuewe, jgg Cc: arnd, gregkh, linux-integrity, linux-kernel, jeffrin, alex > -----Original Message----- > From: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> > Sent: Wednesday, May 27, 2020 3:09 PM > To: James Bottomley; Limonciello, Mario; peterhuewe@gmx.de; jgg@ziepe.ca > Cc: arnd@arndb.de; gregkh@linuxfoundation.org; linux-integrity@vger.kernel.org; > linux-kernel@vger.kernel.org; jeffrin@rajagiritech.edu.in; alex@guzman.io > Subject: Re: [PATCH] tpm: Revert "tpm: fix invalid locking in NONBLOCKING mode" > > > [EXTERNAL EMAIL] > > On Tue, 2020-05-26 at 12:38 -0700, James Bottomley wrote: > > On Tue, 2020-05-26 at 19:23 +0000, Mario.Limonciello@dell.com wrote: > > > > On Tue, 2020-05-26 at 13:32 -0500, Mario Limonciello wrote: > > > > > This reverts commit d23d12484307b40eea549b8a858f5fffad913897. > > > > > > > > > > This commit has caused regressions for the XPS 9560 containing > > > > > a Nuvoton TPM. > > > > > > > > Presumably this is using the tis driver? > > > > > > Correct. > > > > > > > > As mentioned by the reporter all TPM2 commands are failing with: > > > > > ERROR:tcti:src/tss2-tcti/tcti- > > > > > device.c:290:tcti_device_receive() > > > > > Failed to read response from fd 3, got errno 1: Operation not > > > > > permitted > > > > > > > > > > The reporter bisected this issue back to this commit which was > > > > > backported to stable as commit 4d6ebc4. > > > > > > > > I think the problem is request_locality ... for some inexplicable > > > > reason a failure there returns -1, which is EPERM to user space. > > > > > > > > That seems to be a bug in the async code since everything else > > > > gives a ESPIPE error if tpm_try_get_ops fails ... at least no-one > > > > assumes it gives back a sensible return code. > > > > > > > > What I think is happening is that with the patch the TPM goes > > > > through a quick sequence of request, relinquish, request, > > > > relinquish and it's the third request which is failing (likely > > > > timing out). Without the patch, the patch there's only one > > > > request,relinquish cycle because the ops are held while the async > > > > work is executed. I have a vague recollection that there is a > > > > problem with too many locality request in quick succession, but > > > > I'll defer to Jason, who I think understands the intricacies of > > > > localities better than I do. > > > > > > Thanks, I don't pretend to understand the nuances of this particular > > > code, but I was hoping that the request to revert got some attention > > > since Alex's kernel Bugzilla and message a few months ago to linux > > > integrity weren't. > > > > > > > If that's the problem, the solution looks simple enough: just move > > > > the ops get down because the priv state is already protected by the > > > > buffer mutex > > > > > > Yeah, if that works for Alex's situation it certainly sounds like a > > > better solution than reverting this patch as this patch actually does > > > fix a problem reported by Jeffrin originally. > > > > > > Could you propose a specific patch that Alex and Jeffrin can perhaps > > > both try? > > > > Um, what's wrong with the one I originally attached and which you quote > > below? It's only compile tested, but I think it will work, if the > > theory is correct. > > Please send a legit patch, thanks. > > /Jarkko Jarkko, After the confirmation from Alex that this patch attached to the end of the thread worked, James did send a proper patch that can be accessed here: https://lore.kernel.org/linux-integrity/20200527155800.ya43xm2ltuwduwjg@cantor/T/#t Thanks, ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] tpm: Revert "tpm: fix invalid locking in NONBLOCKING mode" 2020-05-27 20:18 ` Mario.Limonciello @ 2020-05-28 0:43 ` Jarkko Sakkinen 2020-05-28 0:59 ` Mario.Limonciello 0 siblings, 1 reply; 20+ messages in thread From: Jarkko Sakkinen @ 2020-05-28 0:43 UTC (permalink / raw) To: Mario.Limonciello Cc: James.Bottomley, peterhuewe, jgg, arnd, gregkh, linux-integrity, linux-kernel, jeffrin, alex On Wed, May 27, 2020 at 08:18:56PM +0000, Mario.Limonciello@dell.com wrote: > > -----Original Message----- > > From: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> > > Sent: Wednesday, May 27, 2020 3:09 PM > > To: James Bottomley; Limonciello, Mario; peterhuewe@gmx.de; jgg@ziepe.ca > > Cc: arnd@arndb.de; gregkh@linuxfoundation.org; linux-integrity@vger.kernel.org; > > linux-kernel@vger.kernel.org; jeffrin@rajagiritech.edu.in; alex@guzman.io > > Subject: Re: [PATCH] tpm: Revert "tpm: fix invalid locking in NONBLOCKING mode" > > > > > > [EXTERNAL EMAIL] What is this? > > On Tue, 2020-05-26 at 12:38 -0700, James Bottomley wrote: > > > On Tue, 2020-05-26 at 19:23 +0000, Mario.Limonciello@dell.com wrote: > > > > > On Tue, 2020-05-26 at 13:32 -0500, Mario Limonciello wrote: > > > > > > This reverts commit d23d12484307b40eea549b8a858f5fffad913897. > > > > > > > > > > > > This commit has caused regressions for the XPS 9560 containing > > > > > > a Nuvoton TPM. > > > > > > > > > > Presumably this is using the tis driver? > > > > > > > > Correct. > > > > > > > > > > As mentioned by the reporter all TPM2 commands are failing with: > > > > > > ERROR:tcti:src/tss2-tcti/tcti- > > > > > > device.c:290:tcti_device_receive() > > > > > > Failed to read response from fd 3, got errno 1: Operation not > > > > > > permitted > > > > > > > > > > > > The reporter bisected this issue back to this commit which was > > > > > > backported to stable as commit 4d6ebc4. > > > > > > > > > > I think the problem is request_locality ... for some inexplicable > > > > > reason a failure there returns -1, which is EPERM to user space. > > > > > > > > > > That seems to be a bug in the async code since everything else > > > > > gives a ESPIPE error if tpm_try_get_ops fails ... at least no-one > > > > > assumes it gives back a sensible return code. > > > > > > > > > > What I think is happening is that with the patch the TPM goes > > > > > through a quick sequence of request, relinquish, request, > > > > > relinquish and it's the third request which is failing (likely > > > > > timing out). Without the patch, the patch there's only one > > > > > request,relinquish cycle because the ops are held while the async > > > > > work is executed. I have a vague recollection that there is a > > > > > problem with too many locality request in quick succession, but > > > > > I'll defer to Jason, who I think understands the intricacies of > > > > > localities better than I do. > > > > > > > > Thanks, I don't pretend to understand the nuances of this particular > > > > code, but I was hoping that the request to revert got some attention > > > > since Alex's kernel Bugzilla and message a few months ago to linux > > > > integrity weren't. > > > > > > > > > If that's the problem, the solution looks simple enough: just move > > > > > the ops get down because the priv state is already protected by the > > > > > buffer mutex > > > > > > > > Yeah, if that works for Alex's situation it certainly sounds like a > > > > better solution than reverting this patch as this patch actually does > > > > fix a problem reported by Jeffrin originally. > > > > > > > > Could you propose a specific patch that Alex and Jeffrin can perhaps > > > > both try? > > > > > > Um, what's wrong with the one I originally attached and which you quote > > > below? It's only compile tested, but I think it will work, if the > > > theory is correct. > > > > Please send a legit patch, thanks. > > > > /Jarkko > > Jarkko, > > After the confirmation from Alex that this patch attached to the end of the thread > worked, James did send a proper patch that can be accessed here: > https://lore.kernel.org/linux-integrity/20200527155800.ya43xm2ltuwduwjg@cantor/T/#t > > Thanks, Hi thanks a lot! I did read the full discussions and agree with the conclusions as I get a patch in proper form. Please ping next time a bit earlier. It's not that I don't want to deal with the issues quickly as possible. It's probably just that I've forgot something or missed. /Jarkko ^ permalink raw reply [flat|nested] 20+ messages in thread
* RE: [PATCH] tpm: Revert "tpm: fix invalid locking in NONBLOCKING mode" 2020-05-28 0:43 ` Jarkko Sakkinen @ 2020-05-28 0:59 ` Mario.Limonciello 2020-05-28 6:53 ` Jarkko Sakkinen 0 siblings, 1 reply; 20+ messages in thread From: Mario.Limonciello @ 2020-05-28 0:59 UTC (permalink / raw) To: jarkko.sakkinen Cc: James.Bottomley, peterhuewe, jgg, arnd, gregkh, linux-integrity, linux-kernel, jeffrin, alex > > > [EXTERNAL EMAIL] > > What is this? Something my employer's mail system automatically tags in external email. My mistakes in forgetting to remove it on the response. > > > > On Tue, 2020-05-26 at 12:38 -0700, James Bottomley wrote: > > > > On Tue, 2020-05-26 at 19:23 +0000, Mario.Limonciello@dell.com wrote: > > > > > > On Tue, 2020-05-26 at 13:32 -0500, Mario Limonciello wrote: > > > > > > > This reverts commit d23d12484307b40eea549b8a858f5fffad913897. > > > > > > > > > > > > > > This commit has caused regressions for the XPS 9560 containing > > > > > > > a Nuvoton TPM. > > > > > > > > > > > > Presumably this is using the tis driver? > > > > > > > > > > Correct. > > > > > > > > > > > > As mentioned by the reporter all TPM2 commands are failing with: > > > > > > > ERROR:tcti:src/tss2-tcti/tcti- > > > > > > > device.c:290:tcti_device_receive() > > > > > > > Failed to read response from fd 3, got errno 1: Operation not > > > > > > > permitted > > > > > > > > > > > > > > The reporter bisected this issue back to this commit which was > > > > > > > backported to stable as commit 4d6ebc4. > > > > > > > > > > > > I think the problem is request_locality ... for some inexplicable > > > > > > reason a failure there returns -1, which is EPERM to user space. > > > > > > > > > > > > That seems to be a bug in the async code since everything else > > > > > > gives a ESPIPE error if tpm_try_get_ops fails ... at least no-one > > > > > > assumes it gives back a sensible return code. > > > > > > > > > > > > What I think is happening is that with the patch the TPM goes > > > > > > through a quick sequence of request, relinquish, request, > > > > > > relinquish and it's the third request which is failing (likely > > > > > > timing out). Without the patch, the patch there's only one > > > > > > request,relinquish cycle because the ops are held while the async > > > > > > work is executed. I have a vague recollection that there is a > > > > > > problem with too many locality request in quick succession, but > > > > > > I'll defer to Jason, who I think understands the intricacies of > > > > > > localities better than I do. > > > > > > > > > > Thanks, I don't pretend to understand the nuances of this particular > > > > > code, but I was hoping that the request to revert got some attention > > > > > since Alex's kernel Bugzilla and message a few months ago to linux > > > > > integrity weren't. > > > > > > > > > > > If that's the problem, the solution looks simple enough: just move > > > > > > the ops get down because the priv state is already protected by the > > > > > > buffer mutex > > > > > > > > > > Yeah, if that works for Alex's situation it certainly sounds like a > > > > > better solution than reverting this patch as this patch actually does > > > > > fix a problem reported by Jeffrin originally. > > > > > > > > > > Could you propose a specific patch that Alex and Jeffrin can perhaps > > > > > both try? > > > > > > > > Um, what's wrong with the one I originally attached and which you quote > > > > below? It's only compile tested, but I think it will work, if the > > > > theory is correct. > > > > > > Please send a legit patch, thanks. > > > > > > /Jarkko > > > > Jarkko, > > > > After the confirmation from Alex that this patch attached to the end of the > thread > > worked, James did send a proper patch that can be accessed here: > > https://lore.kernel.org/linux- > integrity/20200527155800.ya43xm2ltuwduwjg@cantor/T/#t > > > > Thanks, > > Hi thanks a lot! I did read the full discussions and agree with the > conclusions as I get a patch in proper form. > > Please ping next time a bit earlier. It's not that I don't want to deal > with the issues quickly as possible. It's probably just that I've forgot > something or missed. > > /Jarkko Thanks! I completely forgot about it too, it was mentioned to me right after holidays and I forgot to follow up and see that it got sorted. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] tpm: Revert "tpm: fix invalid locking in NONBLOCKING mode" 2020-05-28 0:59 ` Mario.Limonciello @ 2020-05-28 6:53 ` Jarkko Sakkinen 0 siblings, 0 replies; 20+ messages in thread From: Jarkko Sakkinen @ 2020-05-28 6:53 UTC (permalink / raw) To: Mario.Limonciello Cc: James.Bottomley, peterhuewe, jgg, arnd, gregkh, linux-integrity, linux-kernel, jeffrin, alex On Thu, May 28, 2020 at 12:59:59AM +0000, Mario.Limonciello@dell.com wrote: > > > > [EXTERNAL EMAIL] > > > > What is this? > > Something my employer's mail system automatically tags in external email. > > My mistakes in forgetting to remove it on the response. NP, just asking :-) > > > > On Tue, 2020-05-26 at 12:38 -0700, James Bottomley wrote: > > > > > On Tue, 2020-05-26 at 19:23 +0000, Mario.Limonciello@dell.com wrote: > > > > > > > On Tue, 2020-05-26 at 13:32 -0500, Mario Limonciello wrote: > > > > > > > > This reverts commit d23d12484307b40eea549b8a858f5fffad913897. > > > > > > > > > > > > > > > > This commit has caused regressions for the XPS 9560 containing > > > > > > > > a Nuvoton TPM. > > > > > > > > > > > > > > Presumably this is using the tis driver? > > > > > > > > > > > > Correct. > > > > > > > > > > > > > > As mentioned by the reporter all TPM2 commands are failing with: > > > > > > > > ERROR:tcti:src/tss2-tcti/tcti- > > > > > > > > device.c:290:tcti_device_receive() > > > > > > > > Failed to read response from fd 3, got errno 1: Operation not > > > > > > > > permitted > > > > > > > > > > > > > > > > The reporter bisected this issue back to this commit which was > > > > > > > > backported to stable as commit 4d6ebc4. > > > > > > > > > > > > > > I think the problem is request_locality ... for some inexplicable > > > > > > > reason a failure there returns -1, which is EPERM to user space. > > > > > > > > > > > > > > That seems to be a bug in the async code since everything else > > > > > > > gives a ESPIPE error if tpm_try_get_ops fails ... at least no-one > > > > > > > assumes it gives back a sensible return code. > > > > > > > > > > > > > > What I think is happening is that with the patch the TPM goes > > > > > > > through a quick sequence of request, relinquish, request, > > > > > > > relinquish and it's the third request which is failing (likely > > > > > > > timing out). Without the patch, the patch there's only one > > > > > > > request,relinquish cycle because the ops are held while the async > > > > > > > work is executed. I have a vague recollection that there is a > > > > > > > problem with too many locality request in quick succession, but > > > > > > > I'll defer to Jason, who I think understands the intricacies of > > > > > > > localities better than I do. > > > > > > > > > > > > Thanks, I don't pretend to understand the nuances of this particular > > > > > > code, but I was hoping that the request to revert got some attention > > > > > > since Alex's kernel Bugzilla and message a few months ago to linux > > > > > > integrity weren't. > > > > > > > > > > > > > If that's the problem, the solution looks simple enough: just move > > > > > > > the ops get down because the priv state is already protected by the > > > > > > > buffer mutex > > > > > > > > > > > > Yeah, if that works for Alex's situation it certainly sounds like a > > > > > > better solution than reverting this patch as this patch actually does > > > > > > fix a problem reported by Jeffrin originally. > > > > > > > > > > > > Could you propose a specific patch that Alex and Jeffrin can perhaps > > > > > > both try? > > > > > > > > > > Um, what's wrong with the one I originally attached and which you quote > > > > > below? It's only compile tested, but I think it will work, if the > > > > > theory is correct. > > > > > > > > Please send a legit patch, thanks. > > > > > > > > /Jarkko > > > > > > Jarkko, > > > > > > After the confirmation from Alex that this patch attached to the end of the > > thread > > > worked, James did send a proper patch that can be accessed here: > > > https://lore.kernel.org/linux- > > integrity/20200527155800.ya43xm2ltuwduwjg@cantor/T/#t > > > > > > Thanks, > > > > Hi thanks a lot! I did read the full discussions and agree with the > > conclusions as I get a patch in proper form. > > > > Please ping next time a bit earlier. It's not that I don't want to deal > > with the issues quickly as possible. It's probably just that I've forgot > > something or missed. > > > > /Jarkko > > Thanks! > > I completely forgot about it too, it was mentioned to me right after holidays > and I forgot to follow up and see that it got sorted. Yeah, sure, lets try to get a fix landed asap :-) /Jarkko ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] tpm: Revert "tpm: fix invalid locking in NONBLOCKING mode" 2020-05-26 19:14 ` James Bottomley 2020-05-26 19:23 ` Mario.Limonciello @ 2020-05-26 19:39 ` Tadeusz Struk 2020-05-26 20:00 ` James Bottomley 2020-05-28 0:30 ` Jarkko Sakkinen 1 sibling, 2 replies; 20+ messages in thread From: Tadeusz Struk @ 2020-05-26 19:39 UTC (permalink / raw) To: James Bottomley, Mario Limonciello, Peter Huewe, Jarkko Sakkinen, Jason Gunthorpe Cc: Arnd Bergmann, Greg Kroah-Hartman, linux-integrity, linux-kernel, Jeffrin Jose T, Alex Guzman On 5/26/20 12:14 PM, James Bottomley wrote: > + /* atomic tpm command send and result receive. We only hold the ops > + * lock during this period so that the tpm can be unregistered even if > + * the char dev is held open. > + */ > + if (tpm_try_get_ops(priv->chip)) { > + ret = -EPIPE; > + goto out; > + } > + Hi James, This won't help if the message is read by an async tcti. If the problem lies in the chip get locality code, perhaps this could help to debug the root-cause instead of masking it out in the upper layer code: diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c index 2435216bd10a..da5ecd0376bf 100644 --- a/drivers/char/tpm/tpm_tis_core.c +++ b/drivers/char/tpm/tpm_tis_core.c @@ -202,20 +202,22 @@ static int request_locality(struct tpm_chip *chip, int l) return rc; stop = jiffies + chip->timeout_a; + timeout = stop - jiffies; if (chip->flags & TPM_CHIP_FLAG_IRQ) { again: timeout = stop - jiffies; if ((long)timeout <= 0) - return -1; + goto out; + rc = wait_event_interruptible_timeout(priv->int_queue, - (check_locality - (chip, l)), + check_locality(chip, l), timeout); if (rc > 0) return l; if (rc == -ERESTARTSYS && freezing(current)) { clear_thread_flag(TIF_SIGPENDING); + timeout = stop - jiffies; goto again; } } else { @@ -226,6 +228,10 @@ static int request_locality(struct tpm_chip *chip, int l) tpm_msleep(TPM_TIMEOUT); } while (time_before(jiffies, stop)); } +out: + dev_warn(&chip->dev, "%s: failed to request locality %d after %lu ms\n", + __func__, l, timeout * HZ / 1000); + return -1; } ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH] tpm: Revert "tpm: fix invalid locking in NONBLOCKING mode" 2020-05-26 19:39 ` Tadeusz Struk @ 2020-05-26 20:00 ` James Bottomley 2020-05-26 21:33 ` Tadeusz Struk 2020-05-28 0:30 ` Jarkko Sakkinen 1 sibling, 1 reply; 20+ messages in thread From: James Bottomley @ 2020-05-26 20:00 UTC (permalink / raw) To: Tadeusz Struk, Mario Limonciello, Peter Huewe, Jarkko Sakkinen, Jason Gunthorpe Cc: Arnd Bergmann, Greg Kroah-Hartman, linux-integrity, linux-kernel, Jeffrin Jose T, Alex Guzman On Tue, 2020-05-26 at 12:39 -0700, Tadeusz Struk wrote: > On 5/26/20 12:14 PM, James Bottomley wrote: > > + /* atomic tpm command send and result receive. We only > > hold the ops > > + * lock during this period so that the tpm can be > > unregistered even if > > + * the char dev is held open. > > + */ > > + if (tpm_try_get_ops(priv->chip)) { > > + ret = -EPIPE; > > + goto out; > > + } > > + > > Hi James, > This won't help if the message is read by an async tcti. Why not? It moves the ops get underneath the async path, so it's now all done in the direct read or the async read. That seems to be more efficient. > If the problem lies in the chip get locality code, perhaps this > could help to debug the root-cause instead of masking it out in the > upper layer code: I don't think there is a root cause other than a TIS TPM is getting annoyed by us cycling localities too rapidly because we don't do an actual TPM operation between request and relinquish. Since the first request/relinquish seems unnecessary for the async case, moving the ops get eliminates the problem. James ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] tpm: Revert "tpm: fix invalid locking in NONBLOCKING mode" 2020-05-26 20:00 ` James Bottomley @ 2020-05-26 21:33 ` Tadeusz Struk 2020-05-26 22:34 ` Alex Guzman 0 siblings, 1 reply; 20+ messages in thread From: Tadeusz Struk @ 2020-05-26 21:33 UTC (permalink / raw) To: James Bottomley, Mario Limonciello, Peter Huewe, Jarkko Sakkinen, Jason Gunthorpe Cc: Arnd Bergmann, Greg Kroah-Hartman, linux-integrity, linux-kernel, Jeffrin Jose T, Alex Guzman On 5/26/20 1:00 PM, James Bottomley wrote: > I don't think there is a root cause other than a TIS TPM is getting > annoyed by us cycling localities too rapidly because we don't do an > actual TPM operation between request and relinquish. Since the first > request/relinquish seems unnecessary for the async case, moving the ops > get eliminates the problem. Could be, so maybe we could try both patches. More debug info on the error path won't hurt. Thanks, Tadeusz ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] tpm: Revert "tpm: fix invalid locking in NONBLOCKING mode" 2020-05-26 21:33 ` Tadeusz Struk @ 2020-05-26 22:34 ` Alex Guzman 0 siblings, 0 replies; 20+ messages in thread From: Alex Guzman @ 2020-05-26 22:34 UTC (permalink / raw) To: Tadeusz Struk, James Bottomley, Mario Limonciello, Peter Huewe, Jarkko Sakkinen, Jason Gunthorpe Cc: Arnd Bergmann, Greg Kroah-Hartman, linux-integrity, linux-kernel, Jeffrin Jose T On Tue, 2020-05-26 at 14:33 -0700, Tadeusz Struk wrote: > On 5/26/20 1:00 PM, James Bottomley wrote: > > I don't think there is a root cause other than a TIS TPM is getting > > annoyed by us cycling localities too rapidly because we don't do an > > actual TPM operation between request and relinquish. Since the > > first > > request/relinquish seems unnecessary for the async case, moving the > > ops > > get eliminates the problem. > > Could be, so maybe we could try both patches. > More debug info on the error path won't hurt. > Thanks, > Tadeusz With your logging patch, I consistently see this message in dmesg when tpm2_getcap fails: tpm tpm0: request_locality: failed to request locality 0 after 750 ms ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] tpm: Revert "tpm: fix invalid locking in NONBLOCKING mode" 2020-05-26 19:39 ` Tadeusz Struk 2020-05-26 20:00 ` James Bottomley @ 2020-05-28 0:30 ` Jarkko Sakkinen 2020-05-28 4:40 ` Tadeusz Struk 1 sibling, 1 reply; 20+ messages in thread From: Jarkko Sakkinen @ 2020-05-28 0:30 UTC (permalink / raw) To: Tadeusz Struk Cc: James Bottomley, Mario Limonciello, Peter Huewe, Jason Gunthorpe, Arnd Bergmann, Greg Kroah-Hartman, linux-integrity, linux-kernel, Jeffrin Jose T, Alex Guzman On Tue, May 26, 2020 at 12:39:37PM -0700, Tadeusz Struk wrote: > On 5/26/20 12:14 PM, James Bottomley wrote: > > + /* atomic tpm command send and result receive. We only hold the ops > > + * lock during this period so that the tpm can be unregistered even if > > + * the char dev is held open. > > + */ > > + if (tpm_try_get_ops(priv->chip)) { > > + ret = -EPIPE; > > + goto out; > > + } > > + > Hi James, > This won't help if the message is read by an async tcti. If the problem lies > in the chip get locality code, perhaps this could help to debug the root-cause > instead of masking it out in the upper layer code: What is TCTI and async TCTI? Not following. > diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c > index 2435216bd10a..da5ecd0376bf 100644 > --- a/drivers/char/tpm/tpm_tis_core.c > +++ b/drivers/char/tpm/tpm_tis_core.c > @@ -202,20 +202,22 @@ static int request_locality(struct tpm_chip *chip, int l) > return rc; > > stop = jiffies + chip->timeout_a; > + timeout = stop - jiffies; > > if (chip->flags & TPM_CHIP_FLAG_IRQ) { > again: > timeout = stop - jiffies; > if ((long)timeout <= 0) > - return -1; > + goto out; > + > rc = wait_event_interruptible_timeout(priv->int_queue, > - (check_locality > - (chip, l)), > + check_locality(chip, l), > timeout); > if (rc > 0) > return l; > if (rc == -ERESTARTSYS && freezing(current)) { > clear_thread_flag(TIF_SIGPENDING); > + timeout = stop - jiffies; > goto again; > } > } else { > @@ -226,6 +228,10 @@ static int request_locality(struct tpm_chip *chip, int l) > tpm_msleep(TPM_TIMEOUT); > } while (time_before(jiffies, stop)); > } > +out: > + dev_warn(&chip->dev, "%s: failed to request locality %d after %lu ms\n", > + __func__, l, timeout * HZ / 1000); > + > return -1; > } > /Jarkko ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] tpm: Revert "tpm: fix invalid locking in NONBLOCKING mode" 2020-05-28 0:30 ` Jarkko Sakkinen @ 2020-05-28 4:40 ` Tadeusz Struk 2020-05-28 6:55 ` Jarkko Sakkinen 0 siblings, 1 reply; 20+ messages in thread From: Tadeusz Struk @ 2020-05-28 4:40 UTC (permalink / raw) To: Jarkko Sakkinen Cc: James Bottomley, Mario Limonciello, Peter Huewe, Jason Gunthorpe, Arnd Bergmann, Greg Kroah-Hartman, linux-integrity, linux-kernel, Jeffrin Jose T, Alex Guzman On 5/27/20 5:30 PM, Jarkko Sakkinen wrote: >> This won't help if the message is read by an async tcti. If the problem lies >> in the chip get locality code, perhaps this could help to debug the root-cause >> instead of masking it out in the upper layer code: > What is TCTI and async TCTI? Not following. TPM Command Transmission Interface (TCTI) as defined by TCG in https://trustedcomputinggroup.org/resource/tss-tcti-specification/ the reason we added the O_NONBLOCK mode was to satisfy the TCG spec for async TCTI. Thanks, Tadeusz ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] tpm: Revert "tpm: fix invalid locking in NONBLOCKING mode" 2020-05-28 4:40 ` Tadeusz Struk @ 2020-05-28 6:55 ` Jarkko Sakkinen 0 siblings, 0 replies; 20+ messages in thread From: Jarkko Sakkinen @ 2020-05-28 6:55 UTC (permalink / raw) To: Tadeusz Struk Cc: James Bottomley, Mario Limonciello, Peter Huewe, Jason Gunthorpe, Arnd Bergmann, Greg Kroah-Hartman, linux-integrity, linux-kernel, Jeffrin Jose T, Alex Guzman On Wed, May 27, 2020 at 09:40:25PM -0700, Tadeusz Struk wrote: > On 5/27/20 5:30 PM, Jarkko Sakkinen wrote: > >> This won't help if the message is read by an async tcti. If the problem lies > >> in the chip get locality code, perhaps this could help to debug the root-cause > >> instead of masking it out in the upper layer code: > > What is TCTI and async TCTI? Not following. > > TPM Command Transmission Interface (TCTI) as defined by TCG in > https://trustedcomputinggroup.org/resource/tss-tcti-specification/ > > the reason we added the O_NONBLOCK mode was to satisfy the TCG spec for async TCTI. > > Thanks, > Tadeusz OK, thanks recalling. /Jarkko ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2020-05-28 6:55 UTC | newest] Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-05-26 18:32 [PATCH] tpm: Revert "tpm: fix invalid locking in NONBLOCKING mode" Mario Limonciello 2020-05-26 19:14 ` James Bottomley 2020-05-26 19:23 ` Mario.Limonciello 2020-05-26 19:38 ` James Bottomley 2020-05-26 22:19 ` Alex Guzman 2020-05-26 23:06 ` James Bottomley 2020-05-26 23:31 ` Alex Guzman 2020-05-27 0:18 ` James Bottomley 2020-05-27 20:09 ` Jarkko Sakkinen 2020-05-27 20:18 ` Mario.Limonciello 2020-05-28 0:43 ` Jarkko Sakkinen 2020-05-28 0:59 ` Mario.Limonciello 2020-05-28 6:53 ` Jarkko Sakkinen 2020-05-26 19:39 ` Tadeusz Struk 2020-05-26 20:00 ` James Bottomley 2020-05-26 21:33 ` Tadeusz Struk 2020-05-26 22:34 ` Alex Guzman 2020-05-28 0:30 ` Jarkko Sakkinen 2020-05-28 4:40 ` Tadeusz Struk 2020-05-28 6:55 ` Jarkko Sakkinen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).