All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] tpm_tis: work around status register bug in STMicroelectronics TPM
@ 2020-04-15 22:45 Omar Sandoval
  2020-04-15 23:51 ` James Bottomley
  2020-04-16 17:08 ` Jarkko Sakkinen
  0 siblings, 2 replies; 24+ messages in thread
From: Omar Sandoval @ 2020-04-15 22:45 UTC (permalink / raw)
  To: Peter Huewe, Jarkko Sakkinen, Jason Gunthorpe, linux-integrity

From: Omar Sandoval <osandov@fb.com>

We've encountered a particular model of STMicroelectronics TPM that
transiently returns a bad value in the status register. This causes the
kernel to believe that the TPM is ready to receive a command when it
actually isn't, which in turn causes the send to time out in
get_burstcount(). In testing, reading the status register one extra time
convinces the TPM to return a valid value.

Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 drivers/char/tpm/tpm_tis_core.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
index 27c6ca031e23..277a21027fc7 100644
--- a/drivers/char/tpm/tpm_tis_core.c
+++ b/drivers/char/tpm/tpm_tis_core.c
@@ -238,6 +238,18 @@ static u8 tpm_tis_status(struct tpm_chip *chip)
 	rc = tpm_tis_read8(priv, TPM_STS(priv->locality), &status);
 	if (rc < 0)
 		return 0;
+	/*
+	 * Some STMicroelectronics TPMs have a bug where the status register is
+	 * sometimes bogus (all 1s) if read immediately after the access
+	 * register is written to. Bits 0, 1, and 5 are always supposed to read
+	 * as 0, so this is clearly invalid. Reading the register a second time
+	 * returns a valid value.
+	 */
+	if (unlikely(status == 0xff)) {
+		rc = tpm_tis_read8(priv, TPM_STS(priv->locality), &status);
+		if (rc < 0)
+			return 0;
+	}
 
 	return status;
 }
-- 
2.26.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH] tpm_tis: work around status register bug in STMicroelectronics TPM
  2020-04-15 22:45 [PATCH] tpm_tis: work around status register bug in STMicroelectronics TPM Omar Sandoval
@ 2020-04-15 23:51 ` James Bottomley
  2020-04-16  0:16   ` Omar Sandoval
                     ` (2 more replies)
  2020-04-16 17:08 ` Jarkko Sakkinen
  1 sibling, 3 replies; 24+ messages in thread
From: James Bottomley @ 2020-04-15 23:51 UTC (permalink / raw)
  To: Omar Sandoval, Peter Huewe, Jarkko Sakkinen, Jason Gunthorpe,
	linux-integrity

On Wed, 2020-04-15 at 15:45 -0700, Omar Sandoval wrote:
> From: Omar Sandoval <osandov@fb.com>
> 
> We've encountered a particular model of STMicroelectronics TPM that
> transiently returns a bad value in the status register. This causes
> the kernel to believe that the TPM is ready to receive a command when
> it actually isn't, which in turn causes the send to time out in
> get_burstcount(). In testing, reading the status register one extra
> time convinces the TPM to return a valid value.

Interesting, I've got a very early upgradeable nuvoton that seems to be
behaving like this.

> Signed-off-by: Omar Sandoval <osandov@fb.com>
> ---
>  drivers/char/tpm/tpm_tis_core.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/drivers/char/tpm/tpm_tis_core.c
> b/drivers/char/tpm/tpm_tis_core.c
> index 27c6ca031e23..277a21027fc7 100644
> --- a/drivers/char/tpm/tpm_tis_core.c
> +++ b/drivers/char/tpm/tpm_tis_core.c
> @@ -238,6 +238,18 @@ static u8 tpm_tis_status(struct tpm_chip *chip)
>  	rc = tpm_tis_read8(priv, TPM_STS(priv->locality), &status);
>  	if (rc < 0)
>  		return 0;
> +	/*
> +	 * Some STMicroelectronics TPMs have a bug where the status
> register is
> +	 * sometimes bogus (all 1s) if read immediately after the
> access
> +	 * register is written to. Bits 0, 1, and 5 are always
> supposed to read
> +	 * as 0, so this is clearly invalid. Reading the register a
> second time
> +	 * returns a valid value.
> +	 */
> +	if (unlikely(status == 0xff)) {
> +		rc = tpm_tis_read8(priv, TPM_STS(priv->locality),
> &status);
> +		if (rc < 0)
> +			return 0;
> +	}

You theorize that your case is fixed by the second read, but what if it
isn't and the second read also returns 0xff?  Shouldn't we have a line
here saying

if (unlikely(status == 0xff))
	status = 0;

So if we get a second 0xff we just pretend the thing isn't ready?

James

>  	return status;
>  }


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] tpm_tis: work around status register bug in STMicroelectronics TPM
  2020-04-15 23:51 ` James Bottomley
@ 2020-04-16  0:16   ` Omar Sandoval
  2020-04-16  0:24     ` Omar Sandoval
  2020-04-16 17:09   ` Jarkko Sakkinen
  2020-08-27 15:24   ` Jason Andryuk
  2 siblings, 1 reply; 24+ messages in thread
From: Omar Sandoval @ 2020-04-16  0:16 UTC (permalink / raw)
  To: James Bottomley
  Cc: Peter Huewe, Jarkko Sakkinen, Jason Gunthorpe, linux-integrity

On Wed, Apr 15, 2020 at 04:51:39PM -0700, James Bottomley wrote:
> On Wed, 2020-04-15 at 15:45 -0700, Omar Sandoval wrote:
> > From: Omar Sandoval <osandov@fb.com>
> > 
> > We've encountered a particular model of STMicroelectronics TPM that
> > transiently returns a bad value in the status register. This causes
> > the kernel to believe that the TPM is ready to receive a command when
> > it actually isn't, which in turn causes the send to time out in
> > get_burstcount(). In testing, reading the status register one extra
> > time convinces the TPM to return a valid value.
> 
> Interesting, I've got a very early upgradeable nuvoton that seems to be
> behaving like this.

I'll attach the userspace reproducer I used to figure this out. I'd be
interested to see if it times out on your TPM, too. Note that it bangs
on /dev/mem and assumes that the MMIO address is 0xfed40000. That seems
to be the hard-coded address for x86 in the kernel, but just to be safe
you might want to check `grep MSFT0101 /proc/iomem`.

> > Signed-off-by: Omar Sandoval <osandov@fb.com>
> > ---
> >  drivers/char/tpm/tpm_tis_core.c | 12 ++++++++++++
> >  1 file changed, 12 insertions(+)
> > 
> > diff --git a/drivers/char/tpm/tpm_tis_core.c
> > b/drivers/char/tpm/tpm_tis_core.c
> > index 27c6ca031e23..277a21027fc7 100644
> > --- a/drivers/char/tpm/tpm_tis_core.c
> > +++ b/drivers/char/tpm/tpm_tis_core.c
> > @@ -238,6 +238,18 @@ static u8 tpm_tis_status(struct tpm_chip *chip)
> >  	rc = tpm_tis_read8(priv, TPM_STS(priv->locality), &status);
> >  	if (rc < 0)
> >  		return 0;
> > +	/*
> > +	 * Some STMicroelectronics TPMs have a bug where the status
> > register is
> > +	 * sometimes bogus (all 1s) if read immediately after the
> > access
> > +	 * register is written to. Bits 0, 1, and 5 are always
> > supposed to read
> > +	 * as 0, so this is clearly invalid. Reading the register a
> > second time
> > +	 * returns a valid value.
> > +	 */
> > +	if (unlikely(status == 0xff)) {
> > +		rc = tpm_tis_read8(priv, TPM_STS(priv->locality),
> > &status);
> > +		if (rc < 0)
> > +			return 0;
> > +	}
> 
> You theorize that your case is fixed by the second read, but what if it
> isn't and the second read also returns 0xff?  Shouldn't we have a line
> here saying
> 
> if (unlikely(status == 0xff))
> 	status = 0;
> 
> So if we get a second 0xff we just pretend the thing isn't ready?

We've been running this workaround in production for awhile and the
hangs haven't happened since, and my userspace reproducer never
witnessed a second 0xff. But it wouldn't hurt, so I can add it anyways.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] tpm_tis: work around status register bug in STMicroelectronics TPM
  2020-04-16  0:16   ` Omar Sandoval
@ 2020-04-16  0:24     ` Omar Sandoval
  2020-04-16 18:02       ` James Bottomley
  0 siblings, 1 reply; 24+ messages in thread
From: Omar Sandoval @ 2020-04-16  0:24 UTC (permalink / raw)
  To: James Bottomley
  Cc: Peter Huewe, Jarkko Sakkinen, Jason Gunthorpe, linux-integrity

[-- Attachment #1: Type: text/plain, Size: 1154 bytes --]

On Wed, Apr 15, 2020 at 05:16:05PM -0700, Omar Sandoval wrote:
> On Wed, Apr 15, 2020 at 04:51:39PM -0700, James Bottomley wrote:
> > On Wed, 2020-04-15 at 15:45 -0700, Omar Sandoval wrote:
> > > From: Omar Sandoval <osandov@fb.com>
> > > 
> > > We've encountered a particular model of STMicroelectronics TPM that
> > > transiently returns a bad value in the status register. This causes
> > > the kernel to believe that the TPM is ready to receive a command when
> > > it actually isn't, which in turn causes the send to time out in
> > > get_burstcount(). In testing, reading the status register one extra
> > > time convinces the TPM to return a valid value.
> > 
> > Interesting, I've got a very early upgradeable nuvoton that seems to be
> > behaving like this.
> 
> I'll attach the userspace reproducer I used to figure this out. I'd be
> interested to see if it times out on your TPM, too. Note that it bangs
> on /dev/mem and assumes that the MMIO address is 0xfed40000. That seems
> to be the hard-coded address for x86 in the kernel, but just to be safe
> you might want to check `grep MSFT0101 /proc/iomem`.

Forgot to attach it, of course...

[-- Attachment #2: test_tpm_tis.c --]
[-- Type: text/plain, Size: 2123 bytes --]

#include <fcntl.h>
#include <stdbool.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <sys/types.h>

enum tis_access {
	TPM_ACCESS_VALID = 0x80,
	TPM_ACCESS_ACTIVE_LOCALITY = 0x20,
	TPM_ACCESS_REQUEST_PENDING = 0x04,
	TPM_ACCESS_REQUEST_USE = 0x02,
};

enum tis_status {
	TPM_STS_VALID = 0x80,
	TPM_STS_COMMAND_READY = 0x40,
	TPM_STS_GO = 0x20,
	TPM_STS_DATA_AVAIL = 0x10,
	TPM_STS_DATA_EXPECT = 0x08,
};

#define TPM_ACCESS(l)                   (0x0000 | ((l) << 12))
#define TPM_STS(l)                      (0x0018 | ((l) << 12))

int main(void)
{
	int fd;
	void *map;
	volatile uint8_t *access;
	volatile uint8_t *sts;
	unsigned long long i;

	fd = open("/dev/mem", O_RDWR | O_DSYNC);
	if (fd == -1) {
		perror("open");
		return EXIT_FAILURE;
	}
	map = mmap(NULL, 0x5000, PROT_READ | PROT_WRITE, MAP_SHARED, fd,
		   0xfed40000);
	if (map == MAP_FAILED) {
		perror("mmap");
		return EXIT_FAILURE;
	}
	access = (uint8_t *)map + TPM_ACCESS(0);
	sts = (uint8_t *)map + TPM_STS(0);

	i = 0;
	for (;;) {
		struct timespec stop, now;
		uint32_t burstcnt;
		uint8_t sts_read;

		*access = TPM_ACCESS_REQUEST_USE;
		while ((*access & (TPM_ACCESS_ACTIVE_LOCALITY | TPM_ACCESS_VALID)) !=
		       (TPM_ACCESS_ACTIVE_LOCALITY | TPM_ACCESS_VALID))
			;

		sts_read = *sts;
#if 0
		if (sts_read == 0xff)
			sts_read = *sts;
#endif
		if (!(sts_read & TPM_STS_COMMAND_READY)) {
			*sts = TPM_STS_COMMAND_READY;
			while (!(*sts & TPM_STS_COMMAND_READY))
				;
		}

		clock_gettime(CLOCK_MONOTONIC, &stop);
		stop.tv_sec += 1;
		for (;;) {
			burstcnt = ((*(volatile uint32_t *)sts) >> 8) & 0xffff;
			if (burstcnt)
				break;
			clock_gettime(CLOCK_MONOTONIC, &now);
			if (now.tv_sec > stop.tv_sec ||
			    (now.tv_sec == stop.tv_sec &&
			     now.tv_nsec >= stop.tv_nsec)) {
				fprintf(stderr, "Timed out after %llu iterations\n", i);
				i = 0;
				break;
			}
		}

		*access = TPM_ACCESS_ACTIVE_LOCALITY;
		while ((*access & (TPM_ACCESS_ACTIVE_LOCALITY | TPM_ACCESS_VALID)) !=
		       TPM_ACCESS_VALID)
			;
		i++;
	}

	return EXIT_SUCCESS;
}

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] tpm_tis: work around status register bug in STMicroelectronics TPM
  2020-04-15 22:45 [PATCH] tpm_tis: work around status register bug in STMicroelectronics TPM Omar Sandoval
  2020-04-15 23:51 ` James Bottomley
@ 2020-04-16 17:08 ` Jarkko Sakkinen
  2020-04-16 18:54   ` Omar Sandoval
  1 sibling, 1 reply; 24+ messages in thread
From: Jarkko Sakkinen @ 2020-04-16 17:08 UTC (permalink / raw)
  To: Omar Sandoval; +Cc: Peter Huewe, Jason Gunthorpe, linux-integrity

On Wed, Apr 15, 2020 at 03:45:22PM -0700, Omar Sandoval wrote:
> From: Omar Sandoval <osandov@fb.com>
> 
> We've encountered a particular model of STMicroelectronics TPM that
> transiently returns a bad value in the status register. This causes the
> kernel to believe that the TPM is ready to receive a command when it
> actually isn't, which in turn causes the send to time out in
> get_burstcount(). In testing, reading the status register one extra time
> convinces the TPM to return a valid value.
> 
> Signed-off-by: Omar Sandoval <osandov@fb.com>

So what is the bad value?

/Jarkko

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] tpm_tis: work around status register bug in STMicroelectronics TPM
  2020-04-15 23:51 ` James Bottomley
  2020-04-16  0:16   ` Omar Sandoval
@ 2020-04-16 17:09   ` Jarkko Sakkinen
  2020-04-16 17:56     ` James Bottomley
  2020-08-27 15:24   ` Jason Andryuk
  2 siblings, 1 reply; 24+ messages in thread
From: Jarkko Sakkinen @ 2020-04-16 17:09 UTC (permalink / raw)
  To: James Bottomley
  Cc: Omar Sandoval, Peter Huewe, Jason Gunthorpe, linux-integrity

On Wed, Apr 15, 2020 at 04:51:39PM -0700, James Bottomley wrote:
> On Wed, 2020-04-15 at 15:45 -0700, Omar Sandoval wrote:
> > From: Omar Sandoval <osandov@fb.com>
> > 
> > We've encountered a particular model of STMicroelectronics TPM that
> > transiently returns a bad value in the status register. This causes
> > the kernel to believe that the TPM is ready to receive a command when
> > it actually isn't, which in turn causes the send to time out in
> > get_burstcount(). In testing, reading the status register one extra
> > time convinces the TPM to return a valid value.
> 
> Interesting, I've got a very early upgradeable nuvoton that seems to be
> behaving like this.
> 
> > Signed-off-by: Omar Sandoval <osandov@fb.com>
> > ---
> >  drivers/char/tpm/tpm_tis_core.c | 12 ++++++++++++
> >  1 file changed, 12 insertions(+)
> > 
> > diff --git a/drivers/char/tpm/tpm_tis_core.c
> > b/drivers/char/tpm/tpm_tis_core.c
> > index 27c6ca031e23..277a21027fc7 100644
> > --- a/drivers/char/tpm/tpm_tis_core.c
> > +++ b/drivers/char/tpm/tpm_tis_core.c
> > @@ -238,6 +238,18 @@ static u8 tpm_tis_status(struct tpm_chip *chip)
> >  	rc = tpm_tis_read8(priv, TPM_STS(priv->locality), &status);
> >  	if (rc < 0)
> >  		return 0;
> > +	/*
> > +	 * Some STMicroelectronics TPMs have a bug where the status
> > register is
> > +	 * sometimes bogus (all 1s) if read immediately after the
> > access
> > +	 * register is written to. Bits 0, 1, and 5 are always
> > supposed to read
> > +	 * as 0, so this is clearly invalid. Reading the register a
> > second time
> > +	 * returns a valid value.
> > +	 */
> > +	if (unlikely(status == 0xff)) {
> > +		rc = tpm_tis_read8(priv, TPM_STS(priv->locality),
> > &status);
> > +		if (rc < 0)
> > +			return 0;
> > +	}
> 
> You theorize that your case is fixed by the second read, but what if it
> isn't and the second read also returns 0xff?  Shouldn't we have a line
> here saying
> 
> if (unlikely(status == 0xff))
> 	status = 0;
> 
> So if we get a second 0xff we just pretend the thing isn't ready?

If it eventually settles, would it be better to poll it for a while?

Also, the commit message is ambiguous. "bad value" can be any random
bit sequence.

/Jarkko

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] tpm_tis: work around status register bug in STMicroelectronics TPM
  2020-04-16 17:09   ` Jarkko Sakkinen
@ 2020-04-16 17:56     ` James Bottomley
  0 siblings, 0 replies; 24+ messages in thread
From: James Bottomley @ 2020-04-16 17:56 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: Omar Sandoval, Peter Huewe, Jason Gunthorpe, linux-integrity

On Thu, 2020-04-16 at 20:09 +0300, Jarkko Sakkinen wrote:
> On Wed, Apr 15, 2020 at 04:51:39PM -0700, James Bottomley wrote:
> > On Wed, 2020-04-15 at 15:45 -0700, Omar Sandoval wrote:
> > > From: Omar Sandoval <osandov@fb.com>
> > > 
> > > We've encountered a particular model of STMicroelectronics TPM
> > > that
> > > transiently returns a bad value in the status register. This
> > > causes
> > > the kernel to believe that the TPM is ready to receive a command
> > > when
> > > it actually isn't, which in turn causes the send to time out in
> > > get_burstcount(). In testing, reading the status register one
> > > extra
> > > time convinces the TPM to return a valid value.
> > 
> > Interesting, I've got a very early upgradeable nuvoton that seems
> > to be
> > behaving like this.
> > 
> > > Signed-off-by: Omar Sandoval <osandov@fb.com>
> > > ---
> > >  drivers/char/tpm/tpm_tis_core.c | 12 ++++++++++++
> > >  1 file changed, 12 insertions(+)
> > > 
> > > diff --git a/drivers/char/tpm/tpm_tis_core.c
> > > b/drivers/char/tpm/tpm_tis_core.c
> > > index 27c6ca031e23..277a21027fc7 100644
> > > --- a/drivers/char/tpm/tpm_tis_core.c
> > > +++ b/drivers/char/tpm/tpm_tis_core.c
> > > @@ -238,6 +238,18 @@ static u8 tpm_tis_status(struct tpm_chip
> > > *chip)
> > >  	rc = tpm_tis_read8(priv, TPM_STS(priv->locality),
> > > &status);
> > >  	if (rc < 0)
> > >  		return 0;
> > > +	/*
> > > +	 * Some STMicroelectronics TPMs have a bug where the
> > > status
> > > register is
> > > +	 * sometimes bogus (all 1s) if read immediately after
> > > the
> > > access
> > > +	 * register is written to. Bits 0, 1, and 5 are always
> > > supposed to read
> > > +	 * as 0, so this is clearly invalid. Reading the
> > > register a
> > > second time
> > > +	 * returns a valid value.
> > > +	 */
> > > +	if (unlikely(status == 0xff)) {
> > > +		rc = tpm_tis_read8(priv, TPM_STS(priv-
> > > >locality),
> > > &status);
> > > +		if (rc < 0)
> > > +			return 0;
> > > +	}
> > 
> > You theorize that your case is fixed by the second read, but what
> > if it
> > isn't and the second read also returns 0xff?  Shouldn't we have a
> > line
> > here saying
> > 
> > if (unlikely(status == 0xff))
> > 	status = 0;
> > 
> > So if we get a second 0xff we just pretend the thing isn't ready?
> 
> If it eventually settles, would it be better to poll it for a while?

Omar said they'd never seen the double read fail, so I don't think
polling would be helpful in this case.  If we do get a double read of
0xff I think returning 0 is correct which basically means the TPM isn't
ready and the caller needs to wait a bit.  If you look, most of the
callers of tpm_tis_status will do their own wait and retry in this
case, so effectively we're getting the caller to poll for us.

James


> Also, the commit message is ambiguous. "bad value" can be any random
> bit sequence.
> 
> /Jarkko
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] tpm_tis: work around status register bug in STMicroelectronics TPM
  2020-04-16  0:24     ` Omar Sandoval
@ 2020-04-16 18:02       ` James Bottomley
  2020-04-17 23:55         ` Jarkko Sakkinen
  0 siblings, 1 reply; 24+ messages in thread
From: James Bottomley @ 2020-04-16 18:02 UTC (permalink / raw)
  To: Omar Sandoval
  Cc: Peter Huewe, Jarkko Sakkinen, Jason Gunthorpe, linux-integrity

On Wed, 2020-04-15 at 17:24 -0700, Omar Sandoval wrote:
> On Wed, Apr 15, 2020 at 05:16:05PM -0700, Omar Sandoval wrote:
> > On Wed, Apr 15, 2020 at 04:51:39PM -0700, James Bottomley wrote:
> > > On Wed, 2020-04-15 at 15:45 -0700, Omar Sandoval wrote:
> > > > From: Omar Sandoval <osandov@fb.com>
> > > > 
> > > > We've encountered a particular model of STMicroelectronics TPM
> > > > that transiently returns a bad value in the status register.
> > > > This causes the kernel to believe that the TPM is ready to
> > > > receive a command when it actually isn't, which in turn causes
> > > > the send to time out in get_burstcount(). In testing, reading
> > > > the status register one extra time convinces the TPM to return
> > > > a valid value.
> > > 
> > > Interesting, I've got a very early upgradeable nuvoton that seems
> > > to be behaving like this.
> > 
> > I'll attach the userspace reproducer I used to figure this out. I'd
> > be interested to see if it times out on your TPM, too. Note that it
> > bangs on /dev/mem and assumes that the MMIO address is 0xfed40000.
> > That seems to be the hard-coded address for x86 in the kernel, but
> > just to be safe you might want to check `grep MSFT0101
> > /proc/iomem`.
> 
> Forgot to attach it, of course...


Thanks!  You facebook guys run with interesting kernel options ... I
eventually had to disable CONFIG_STRICT_DEVMEM and rebuild my kernel to
get it to run.

However, the bad news is that this isn't my problem, it seems to be
more timeout related  I get the same symptoms: logs full of

[14570.626594] tpm tpm0: tpm_try_transmit: tpm_send: error -62

and the TPM won't recover until the box is reset.  To get my TPM to be
usable, I have to fiddle our default timeouts like this:

--- a/drivers/char/tpm/tpm.h
+++ b/drivers/char/tpm/tpm.h
@@ -41,8 +41,8 @@ enum tpm_timeout {
        TPM_TIMEOUT_RETRY = 100, /* msecs */
        TPM_TIMEOUT_RANGE_US = 300,     /* usecs */
        TPM_TIMEOUT_POLL = 1,   /* msecs */
-       TPM_TIMEOUT_USECS_MIN = 100,      /* usecs */
-       TPM_TIMEOUT_USECS_MAX = 500      /* usecs */
+       TPM_TIMEOUT_USECS_MIN = 750,      /* usecs */
+       TPM_TIMEOUT_USECS_MAX = 1000,      /* usecs */
 };

But I think the problem is unique to my nuvoton because there haven't
been any other reports of problems like this ... and with these
timeouts my system functions normally in spite of me being a heavy TPM
user.

James


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] tpm_tis: work around status register bug in STMicroelectronics TPM
  2020-04-16 17:08 ` Jarkko Sakkinen
@ 2020-04-16 18:54   ` Omar Sandoval
  2020-04-17 23:54     ` Jarkko Sakkinen
  0 siblings, 1 reply; 24+ messages in thread
From: Omar Sandoval @ 2020-04-16 18:54 UTC (permalink / raw)
  To: Jarkko Sakkinen; +Cc: Peter Huewe, Jason Gunthorpe, linux-integrity

On Thu, Apr 16, 2020 at 08:08:10PM +0300, Jarkko Sakkinen wrote:
> On Wed, Apr 15, 2020 at 03:45:22PM -0700, Omar Sandoval wrote:
> > From: Omar Sandoval <osandov@fb.com>
> > 
> > We've encountered a particular model of STMicroelectronics TPM that
> > transiently returns a bad value in the status register. This causes the
> > kernel to believe that the TPM is ready to receive a command when it
> > actually isn't, which in turn causes the send to time out in
> > get_burstcount(). In testing, reading the status register one extra time
> > convinces the TPM to return a valid value.
> > 
> > Signed-off-by: Omar Sandoval <osandov@fb.com>
> 
> So what is the bad value?

0xff. Sorry, I thought it would be clear from the code and comment in
the patch itself. Would you prefer for me to repeat it in the commit
message?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] tpm_tis: work around status register bug in STMicroelectronics TPM
  2020-04-16 18:54   ` Omar Sandoval
@ 2020-04-17 23:54     ` Jarkko Sakkinen
  0 siblings, 0 replies; 24+ messages in thread
From: Jarkko Sakkinen @ 2020-04-17 23:54 UTC (permalink / raw)
  To: Omar Sandoval; +Cc: Peter Huewe, Jason Gunthorpe, linux-integrity

On Thu, Apr 16, 2020 at 11:54:33AM -0700, Omar Sandoval wrote:
> On Thu, Apr 16, 2020 at 08:08:10PM +0300, Jarkko Sakkinen wrote:
> > On Wed, Apr 15, 2020 at 03:45:22PM -0700, Omar Sandoval wrote:
> > > From: Omar Sandoval <osandov@fb.com>
> > > 
> > > We've encountered a particular model of STMicroelectronics TPM that
> > > transiently returns a bad value in the status register. This causes the
> > > kernel to believe that the TPM is ready to receive a command when it
> > > actually isn't, which in turn causes the send to time out in
> > > get_burstcount(). In testing, reading the status register one extra time
> > > convinces the TPM to return a valid value.
> > > 
> > > Signed-off-by: Omar Sandoval <osandov@fb.com>
> > 
> > So what is the bad value?
> 
> 0xff. Sorry, I thought it would be clear from the code and comment in
> the patch itself. Would you prefer for me to repeat it in the commit
> message?

It is part of the problem description where as "bad value" does
not have any particular meaning.

/Jarkko

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] tpm_tis: work around status register bug in STMicroelectronics TPM
  2020-04-16 18:02       ` James Bottomley
@ 2020-04-17 23:55         ` Jarkko Sakkinen
  2020-04-18  0:12           ` James Bottomley
  0 siblings, 1 reply; 24+ messages in thread
From: Jarkko Sakkinen @ 2020-04-17 23:55 UTC (permalink / raw)
  To: James Bottomley
  Cc: Omar Sandoval, Peter Huewe, Jason Gunthorpe, linux-integrity

On Thu, Apr 16, 2020 at 11:02:51AM -0700, James Bottomley wrote:
> On Wed, 2020-04-15 at 17:24 -0700, Omar Sandoval wrote:
> > On Wed, Apr 15, 2020 at 05:16:05PM -0700, Omar Sandoval wrote:
> > > On Wed, Apr 15, 2020 at 04:51:39PM -0700, James Bottomley wrote:
> > > > On Wed, 2020-04-15 at 15:45 -0700, Omar Sandoval wrote:
> > > > > From: Omar Sandoval <osandov@fb.com>
> > > > > 
> > > > > We've encountered a particular model of STMicroelectronics TPM
> > > > > that transiently returns a bad value in the status register.
> > > > > This causes the kernel to believe that the TPM is ready to
> > > > > receive a command when it actually isn't, which in turn causes
> > > > > the send to time out in get_burstcount(). In testing, reading
> > > > > the status register one extra time convinces the TPM to return
> > > > > a valid value.
> > > > 
> > > > Interesting, I've got a very early upgradeable nuvoton that seems
> > > > to be behaving like this.
> > > 
> > > I'll attach the userspace reproducer I used to figure this out. I'd
> > > be interested to see if it times out on your TPM, too. Note that it
> > > bangs on /dev/mem and assumes that the MMIO address is 0xfed40000.
> > > That seems to be the hard-coded address for x86 in the kernel, but
> > > just to be safe you might want to check `grep MSFT0101
> > > /proc/iomem`.
> > 
> > Forgot to attach it, of course...
> 
> 
> Thanks!  You facebook guys run with interesting kernel options ... I
> eventually had to disable CONFIG_STRICT_DEVMEM and rebuild my kernel to
> get it to run.
> 
> However, the bad news is that this isn't my problem, it seems to be
> more timeout related  I get the same symptoms: logs full of
> 
> [14570.626594] tpm tpm0: tpm_try_transmit: tpm_send: error -62
> 
> and the TPM won't recover until the box is reset.  To get my TPM to be
> usable, I have to fiddle our default timeouts like this:
> 
> --- a/drivers/char/tpm/tpm.h
> +++ b/drivers/char/tpm/tpm.h
> @@ -41,8 +41,8 @@ enum tpm_timeout {
>         TPM_TIMEOUT_RETRY = 100, /* msecs */
>         TPM_TIMEOUT_RANGE_US = 300,     /* usecs */
>         TPM_TIMEOUT_POLL = 1,   /* msecs */
> -       TPM_TIMEOUT_USECS_MIN = 100,      /* usecs */
> -       TPM_TIMEOUT_USECS_MAX = 500      /* usecs */
> +       TPM_TIMEOUT_USECS_MIN = 750,      /* usecs */
> +       TPM_TIMEOUT_USECS_MAX = 1000,      /* usecs */
>  };
> 
> But I think the problem is unique to my nuvoton because there haven't
> been any other reports of problems like this ... and with these
> timeouts my system functions normally in spite of me being a heavy TPM
> user.

What downsides there would be to increase these a bit?

/Jarkko

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] tpm_tis: work around status register bug in STMicroelectronics TPM
  2020-04-17 23:55         ` Jarkko Sakkinen
@ 2020-04-18  0:12           ` James Bottomley
  2020-04-20 20:46             ` Jarkko Sakkinen
  0 siblings, 1 reply; 24+ messages in thread
From: James Bottomley @ 2020-04-18  0:12 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: Omar Sandoval, Peter Huewe, Jason Gunthorpe, linux-integrity

On Sat, 2020-04-18 at 02:55 +0300, Jarkko Sakkinen wrote:
> On Thu, Apr 16, 2020 at 11:02:51AM -0700, James Bottomley wrote:
> > On Wed, 2020-04-15 at 17:24 -0700, Omar Sandoval wrote:
> > > On Wed, Apr 15, 2020 at 05:16:05PM -0700, Omar Sandoval wrote:
> > > > On Wed, Apr 15, 2020 at 04:51:39PM -0700, James Bottomley
> > > > wrote:
> > > > > On Wed, 2020-04-15 at 15:45 -0700, Omar Sandoval wrote:
> > > > > > From: Omar Sandoval <osandov@fb.com>
> > > > > > 
> > > > > > We've encountered a particular model of STMicroelectronics
> > > > > > TPM
> > > > > > that transiently returns a bad value in the status
> > > > > > register.
> > > > > > This causes the kernel to believe that the TPM is ready to
> > > > > > receive a command when it actually isn't, which in turn
> > > > > > causes
> > > > > > the send to time out in get_burstcount(). In testing,
> > > > > > reading
> > > > > > the status register one extra time convinces the TPM to
> > > > > > return
> > > > > > a valid value.
> > > > > 
> > > > > Interesting, I've got a very early upgradeable nuvoton that
> > > > > seems
> > > > > to be behaving like this.
> > > > 
> > > > I'll attach the userspace reproducer I used to figure this out.
> > > > I'd
> > > > be interested to see if it times out on your TPM, too. Note
> > > > that it
> > > > bangs on /dev/mem and assumes that the MMIO address is
> > > > 0xfed40000.
> > > > That seems to be the hard-coded address for x86 in the kernel,
> > > > but
> > > > just to be safe you might want to check `grep MSFT0101
> > > > /proc/iomem`.
> > > 
> > > Forgot to attach it, of course...
> > 
> > 
> > Thanks!  You facebook guys run with interesting kernel options ...
> > I
> > eventually had to disable CONFIG_STRICT_DEVMEM and rebuild my
> > kernel to
> > get it to run.
> > 
> > However, the bad news is that this isn't my problem, it seems to be
> > more timeout related  I get the same symptoms: logs full of
> > 
> > [14570.626594] tpm tpm0: tpm_try_transmit: tpm_send: error -62
> > 
> > and the TPM won't recover until the box is reset.  To get my TPM to
> > be
> > usable, I have to fiddle our default timeouts like this:
> > 
> > --- a/drivers/char/tpm/tpm.h
> > +++ b/drivers/char/tpm/tpm.h
> > @@ -41,8 +41,8 @@ enum tpm_timeout {
> >         TPM_TIMEOUT_RETRY = 100, /* msecs */
> >         TPM_TIMEOUT_RANGE_US = 300,     /* usecs */
> >         TPM_TIMEOUT_POLL = 1,   /* msecs */
> > -       TPM_TIMEOUT_USECS_MIN = 100,      /* usecs */
> > -       TPM_TIMEOUT_USECS_MAX = 500      /* usecs */
> > +       TPM_TIMEOUT_USECS_MIN = 750,      /* usecs */
> > +       TPM_TIMEOUT_USECS_MAX = 1000,      /* usecs */
> >  };
> > 
> > But I think the problem is unique to my nuvoton because there
> > haven't
> > been any other reports of problems like this ... and with these
> > timeouts my system functions normally in spite of me being a heavy
> > TPM
> > user.
> 
> What downsides there would be to increase these a bit?

PCR writes would take longer meaning IMA initialization would become
slower.

James


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] tpm_tis: work around status register bug in STMicroelectronics TPM
  2020-04-18  0:12           ` James Bottomley
@ 2020-04-20 20:46             ` Jarkko Sakkinen
  2020-04-20 22:28               ` James Bottomley
  0 siblings, 1 reply; 24+ messages in thread
From: Jarkko Sakkinen @ 2020-04-20 20:46 UTC (permalink / raw)
  To: James Bottomley, zohar
  Cc: Omar Sandoval, Peter Huewe, Jason Gunthorpe, linux-integrity

On Fri, Apr 17, 2020 at 05:12:28PM -0700, James Bottomley wrote:
> On Sat, 2020-04-18 at 02:55 +0300, Jarkko Sakkinen wrote:
> > On Thu, Apr 16, 2020 at 11:02:51AM -0700, James Bottomley wrote:
> > > On Wed, 2020-04-15 at 17:24 -0700, Omar Sandoval wrote:
> > > > On Wed, Apr 15, 2020 at 05:16:05PM -0700, Omar Sandoval wrote:
> > > > > On Wed, Apr 15, 2020 at 04:51:39PM -0700, James Bottomley
> > > > > wrote:
> > > > > > On Wed, 2020-04-15 at 15:45 -0700, Omar Sandoval wrote:
> > > > > > > From: Omar Sandoval <osandov@fb.com>
> > > > > > > 
> > > > > > > We've encountered a particular model of STMicroelectronics
> > > > > > > TPM
> > > > > > > that transiently returns a bad value in the status
> > > > > > > register.
> > > > > > > This causes the kernel to believe that the TPM is ready to
> > > > > > > receive a command when it actually isn't, which in turn
> > > > > > > causes
> > > > > > > the send to time out in get_burstcount(). In testing,
> > > > > > > reading
> > > > > > > the status register one extra time convinces the TPM to
> > > > > > > return
> > > > > > > a valid value.
> > > > > > 
> > > > > > Interesting, I've got a very early upgradeable nuvoton that
> > > > > > seems
> > > > > > to be behaving like this.
> > > > > 
> > > > > I'll attach the userspace reproducer I used to figure this out.
> > > > > I'd
> > > > > be interested to see if it times out on your TPM, too. Note
> > > > > that it
> > > > > bangs on /dev/mem and assumes that the MMIO address is
> > > > > 0xfed40000.
> > > > > That seems to be the hard-coded address for x86 in the kernel,
> > > > > but
> > > > > just to be safe you might want to check `grep MSFT0101
> > > > > /proc/iomem`.
> > > > 
> > > > Forgot to attach it, of course...
> > > 
> > > 
> > > Thanks!  You facebook guys run with interesting kernel options ...
> > > I
> > > eventually had to disable CONFIG_STRICT_DEVMEM and rebuild my
> > > kernel to
> > > get it to run.
> > > 
> > > However, the bad news is that this isn't my problem, it seems to be
> > > more timeout related  I get the same symptoms: logs full of
> > > 
> > > [14570.626594] tpm tpm0: tpm_try_transmit: tpm_send: error -62
> > > 
> > > and the TPM won't recover until the box is reset.  To get my TPM to
> > > be
> > > usable, I have to fiddle our default timeouts like this:
> > > 
> > > --- a/drivers/char/tpm/tpm.h
> > > +++ b/drivers/char/tpm/tpm.h
> > > @@ -41,8 +41,8 @@ enum tpm_timeout {
> > >         TPM_TIMEOUT_RETRY = 100, /* msecs */
> > >         TPM_TIMEOUT_RANGE_US = 300,     /* usecs */
> > >         TPM_TIMEOUT_POLL = 1,   /* msecs */
> > > -       TPM_TIMEOUT_USECS_MIN = 100,      /* usecs */
> > > -       TPM_TIMEOUT_USECS_MAX = 500      /* usecs */
> > > +       TPM_TIMEOUT_USECS_MIN = 750,      /* usecs */
> > > +       TPM_TIMEOUT_USECS_MAX = 1000,      /* usecs */
> > >  };
> > > 
> > > But I think the problem is unique to my nuvoton because there
> > > haven't
> > > been any other reports of problems like this ... and with these
> > > timeouts my system functions normally in spite of me being a heavy
> > > TPM
> > > user.
> > 
> > What downsides there would be to increase these a bit?
> 
> PCR writes would take longer meaning IMA initialization would become
> slower.

Does it matter?

/Jarkko

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] tpm_tis: work around status register bug in STMicroelectronics TPM
  2020-04-20 20:46             ` Jarkko Sakkinen
@ 2020-04-20 22:28               ` James Bottomley
  2020-04-21 14:36                 ` Mimi Zohar
  2020-04-21 20:23                 ` Jarkko Sakkinen
  0 siblings, 2 replies; 24+ messages in thread
From: James Bottomley @ 2020-04-20 22:28 UTC (permalink / raw)
  To: Jarkko Sakkinen, zohar
  Cc: Omar Sandoval, Peter Huewe, Jason Gunthorpe, linux-integrity

On Mon, 2020-04-20 at 23:46 +0300, Jarkko Sakkinen wrote:
> On Fri, Apr 17, 2020 at 05:12:28PM -0700, James Bottomley wrote:
> > On Sat, 2020-04-18 at 02:55 +0300, Jarkko Sakkinen wrote:
> > > On Thu, Apr 16, 2020 at 11:02:51AM -0700, James Bottomley wrote:
> > > > On Wed, 2020-04-15 at 17:24 -0700, Omar Sandoval wrote:
> > > > > On Wed, Apr 15, 2020 at 05:16:05PM -0700, Omar Sandoval
> > > > > wrote:
> > > > > > On Wed, Apr 15, 2020 at 04:51:39PM -0700, James Bottomley
> > > > > > wrote:
> > > > > > > On Wed, 2020-04-15 at 15:45 -0700, Omar Sandoval wrote:
> > > > > > > > From: Omar Sandoval <osandov@fb.com>
> > > > > > > > 
> > > > > > > > We've encountered a particular model of
> > > > > > > > STMicroelectronics
> > > > > > > > TPM
> > > > > > > > that transiently returns a bad value in the status
> > > > > > > > register.
> > > > > > > > This causes the kernel to believe that the TPM is ready
> > > > > > > > to
> > > > > > > > receive a command when it actually isn't, which in turn
> > > > > > > > causes
> > > > > > > > the send to time out in get_burstcount(). In testing,
> > > > > > > > reading
> > > > > > > > the status register one extra time convinces the TPM to
> > > > > > > > return
> > > > > > > > a valid value.
> > > > > > > 
> > > > > > > Interesting, I've got a very early upgradeable nuvoton
> > > > > > > that seems to be behaving like this.
> > > > > > 
> > > > > > I'll attach the userspace reproducer I used to figure this
> > > > > > out. I'd be interested to see if it times out on your TPM,
> > > > > > too. Note that it bangs on /dev/mem and assumes that the
> > > > > > MMIO address is 0xfed40000. That seems to be the hard-
> > > > > > coded 
> > > > > > address for x86 in the kernel, but just to be safe you
> > > > > > might want to check `grep MSFT0101 /proc/iomem`.
> > > > > 
> > > > > Forgot to attach it, of course...
> > > > 
> > > > 
> > > > Thanks!  You facebook guys run with interesting kernel options
> > > > ... I eventually had to disable CONFIG_STRICT_DEVMEM and
> > > > rebuild my kernel to get it to run.
> > > > 
> > > > However, the bad news is that this isn't my problem, it seems
> > > > to be more timeout related  I get the same symptoms: logs full
> > > > of
> > > > 
> > > > [14570.626594] tpm tpm0: tpm_try_transmit: tpm_send: error -62
> > > > 
> > > > and the TPM won't recover until the box is reset.  To get my
> > > > TPM to be usable, I have to fiddle our default timeouts like
> > > > this:
> > > > 
> > > > --- a/drivers/char/tpm/tpm.h
> > > > +++ b/drivers/char/tpm/tpm.h
> > > > @@ -41,8 +41,8 @@ enum tpm_timeout {
> > > >         TPM_TIMEOUT_RETRY = 100, /* msecs */
> > > >         TPM_TIMEOUT_RANGE_US = 300,     /* usecs */
> > > >         TPM_TIMEOUT_POLL = 1,   /* msecs */
> > > > -       TPM_TIMEOUT_USECS_MIN = 100,      /* usecs */
> > > > -       TPM_TIMEOUT_USECS_MAX = 500      /* usecs */
> > > > +       TPM_TIMEOUT_USECS_MIN = 750,      /* usecs */
> > > > +       TPM_TIMEOUT_USECS_MAX = 1000,      /* usecs */
> > > >  };
> > > > 
> > > > But I think the problem is unique to my nuvoton because there
> > > > haven't been any other reports of problems like this ... and
> > > > with these timeouts my system functions normally in spite of me
> > > > being a heavy TPM user.
> > > 
> > > What downsides there would be to increase these a bit?
> > 
> > PCR writes would take longer meaning IMA initialization would
> > become slower.
> 
> Does it matter?

Not if you're the one telling Mimi ... and I'm at least 1 mile from the
blast radius.

But more seriously: Nayna Jain did a series of patches improving the
time it takes to poll the TPM for operations precisely because the TPM
PCR extend was going so slowly:

https://lore.kernel.org/linux-integrity/20180516055125.5685-1-nayna@linux.vnet.ibm.com/

I also reported the issue shortly after that patch was integrated, but
everyone seemed to think it was just a problem with my TPM chip (it's
an early Nuvoton field upgraded to 2.0):

https://lore.kernel.org/linux-integrity/1531328689.3260.8.camel@HansenPartnership.com/

James


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] tpm_tis: work around status register bug in STMicroelectronics TPM
  2020-04-20 22:28               ` James Bottomley
@ 2020-04-21 14:36                 ` Mimi Zohar
  2020-04-21 20:25                   ` Jarkko Sakkinen
  2020-04-21 20:23                 ` Jarkko Sakkinen
  1 sibling, 1 reply; 24+ messages in thread
From: Mimi Zohar @ 2020-04-21 14:36 UTC (permalink / raw)
  To: James Bottomley, Jarkko Sakkinen
  Cc: Omar Sandoval, Peter Huewe, Jason Gunthorpe, linux-integrity

On Mon, 2020-04-20 at 15:28 -0700, James Bottomley wrote:
> On Mon, 2020-04-20 at 23:46 +0300, Jarkko Sakkinen wrote:

<snip>

> But more seriously: Nayna Jain did a series of patches improving the
> time it takes to poll the TPM for operations precisely because the TPM
> PCR extend was going so slowly:
> 
> https://lore.kernel.org/linux-integrity/20180516055125.5685-1-nayna@linux.vnet.ibm.com/

The original reason for us needing to improve the TPM performance was
due to the kernel scheduler change.  Refer to commit a233a0289cf9
("tpm: msleep() delays - replace with usleep_range() in i2c nuvoton
driver").  That scheduler change prevented systems from booting.
 Bisecting the kernel to figure out the problem wasn't very
productive.

At least any TPM changes that affect the TPM performance really need
to take into account IMA requirements.

thanks,

Mimi


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] tpm_tis: work around status register bug in STMicroelectronics TPM
  2020-04-20 22:28               ` James Bottomley
  2020-04-21 14:36                 ` Mimi Zohar
@ 2020-04-21 20:23                 ` Jarkko Sakkinen
  2020-04-21 22:08                   ` James Bottomley
  1 sibling, 1 reply; 24+ messages in thread
From: Jarkko Sakkinen @ 2020-04-21 20:23 UTC (permalink / raw)
  To: James Bottomley
  Cc: zohar, Omar Sandoval, Peter Huewe, Jason Gunthorpe, linux-integrity

On Mon, Apr 20, 2020 at 03:28:06PM -0700, James Bottomley wrote:
> On Mon, 2020-04-20 at 23:46 +0300, Jarkko Sakkinen wrote:
> > On Fri, Apr 17, 2020 at 05:12:28PM -0700, James Bottomley wrote:
> > > On Sat, 2020-04-18 at 02:55 +0300, Jarkko Sakkinen wrote:
> > > > On Thu, Apr 16, 2020 at 11:02:51AM -0700, James Bottomley wrote:
> > > > > On Wed, 2020-04-15 at 17:24 -0700, Omar Sandoval wrote:
> > > > > > On Wed, Apr 15, 2020 at 05:16:05PM -0700, Omar Sandoval
> > > > > > wrote:
> > > > > > > On Wed, Apr 15, 2020 at 04:51:39PM -0700, James Bottomley
> > > > > > > wrote:
> > > > > > > > On Wed, 2020-04-15 at 15:45 -0700, Omar Sandoval wrote:
> > > > > > > > > From: Omar Sandoval <osandov@fb.com>
> > > > > > > > > 
> > > > > > > > > We've encountered a particular model of
> > > > > > > > > STMicroelectronics
> > > > > > > > > TPM
> > > > > > > > > that transiently returns a bad value in the status
> > > > > > > > > register.
> > > > > > > > > This causes the kernel to believe that the TPM is ready
> > > > > > > > > to
> > > > > > > > > receive a command when it actually isn't, which in turn
> > > > > > > > > causes
> > > > > > > > > the send to time out in get_burstcount(). In testing,
> > > > > > > > > reading
> > > > > > > > > the status register one extra time convinces the TPM to
> > > > > > > > > return
> > > > > > > > > a valid value.
> > > > > > > > 
> > > > > > > > Interesting, I've got a very early upgradeable nuvoton
> > > > > > > > that seems to be behaving like this.
> > > > > > > 
> > > > > > > I'll attach the userspace reproducer I used to figure this
> > > > > > > out. I'd be interested to see if it times out on your TPM,
> > > > > > > too. Note that it bangs on /dev/mem and assumes that the
> > > > > > > MMIO address is 0xfed40000. That seems to be the hard-
> > > > > > > coded 
> > > > > > > address for x86 in the kernel, but just to be safe you
> > > > > > > might want to check `grep MSFT0101 /proc/iomem`.
> > > > > > 
> > > > > > Forgot to attach it, of course...
> > > > > 
> > > > > 
> > > > > Thanks!  You facebook guys run with interesting kernel options
> > > > > ... I eventually had to disable CONFIG_STRICT_DEVMEM and
> > > > > rebuild my kernel to get it to run.
> > > > > 
> > > > > However, the bad news is that this isn't my problem, it seems
> > > > > to be more timeout related  I get the same symptoms: logs full
> > > > > of
> > > > > 
> > > > > [14570.626594] tpm tpm0: tpm_try_transmit: tpm_send: error -62
> > > > > 
> > > > > and the TPM won't recover until the box is reset.  To get my
> > > > > TPM to be usable, I have to fiddle our default timeouts like
> > > > > this:
> > > > > 
> > > > > --- a/drivers/char/tpm/tpm.h
> > > > > +++ b/drivers/char/tpm/tpm.h
> > > > > @@ -41,8 +41,8 @@ enum tpm_timeout {
> > > > >         TPM_TIMEOUT_RETRY = 100, /* msecs */
> > > > >         TPM_TIMEOUT_RANGE_US = 300,     /* usecs */
> > > > >         TPM_TIMEOUT_POLL = 1,   /* msecs */
> > > > > -       TPM_TIMEOUT_USECS_MIN = 100,      /* usecs */
> > > > > -       TPM_TIMEOUT_USECS_MAX = 500      /* usecs */
> > > > > +       TPM_TIMEOUT_USECS_MIN = 750,      /* usecs */
> > > > > +       TPM_TIMEOUT_USECS_MAX = 1000,      /* usecs */
> > > > >  };
> > > > > 
> > > > > But I think the problem is unique to my nuvoton because there
> > > > > haven't been any other reports of problems like this ... and
> > > > > with these timeouts my system functions normally in spite of me
> > > > > being a heavy TPM user.
> > > > 
> > > > What downsides there would be to increase these a bit?
> > > 
> > > PCR writes would take longer meaning IMA initialization would
> > > become slower.
> > 
> > Does it matter?
> 
> Not if you're the one telling Mimi ... and I'm at least 1 mile from the
> blast radius.
> 
> But more seriously: Nayna Jain did a series of patches improving the
> time it takes to poll the TPM for operations precisely because the TPM
> PCR extend was going so slowly:
> 
> https://lore.kernel.org/linux-integrity/20180516055125.5685-1-nayna@linux.vnet.ibm.com/
> 
> I also reported the issue shortly after that patch was integrated, but
> everyone seemed to think it was just a problem with my TPM chip (it's
> an early Nuvoton field upgraded to 2.0):
> 
> https://lore.kernel.org/linux-integrity/1531328689.3260.8.camel@HansenPartnership.com/

What if we make it dynamic? I.e. if the init code gets -ETIME, then
increase the timeouts?

/Jarkko

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] tpm_tis: work around status register bug in STMicroelectronics TPM
  2020-04-21 14:36                 ` Mimi Zohar
@ 2020-04-21 20:25                   ` Jarkko Sakkinen
  2020-04-21 20:31                     ` Mimi Zohar
  0 siblings, 1 reply; 24+ messages in thread
From: Jarkko Sakkinen @ 2020-04-21 20:25 UTC (permalink / raw)
  To: Mimi Zohar
  Cc: James Bottomley, Omar Sandoval, Peter Huewe, Jason Gunthorpe,
	linux-integrity

On Tue, Apr 21, 2020 at 10:36:04AM -0400, Mimi Zohar wrote:
> On Mon, 2020-04-20 at 15:28 -0700, James Bottomley wrote:
> > On Mon, 2020-04-20 at 23:46 +0300, Jarkko Sakkinen wrote:
> 
> <snip>
> 
> > But more seriously: Nayna Jain did a series of patches improving the
> > time it takes to poll the TPM for operations precisely because the TPM
> > PCR extend was going so slowly:
> > 
> > https://lore.kernel.org/linux-integrity/20180516055125.5685-1-nayna@linux.vnet.ibm.com/
> 
> The original reason for us needing to improve the TPM performance was
> due to the kernel scheduler change.  Refer to commit a233a0289cf9
> ("tpm: msleep() delays - replace with usleep_range() in i2c nuvoton
> driver").  That scheduler change prevented systems from booting.
>  Bisecting the kernel to figure out the problem wasn't very
> productive.
> 
> At least any TPM changes that affect the TPM performance really need
> to take into account IMA requirements.

Thanks Mimi.

With my dynamic proposal it would work as it works now for system
where it worked anyway, and would fix the systems where timeouts
were too short.

/Jarkko

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] tpm_tis: work around status register bug in STMicroelectronics TPM
  2020-04-21 20:25                   ` Jarkko Sakkinen
@ 2020-04-21 20:31                     ` Mimi Zohar
  0 siblings, 0 replies; 24+ messages in thread
From: Mimi Zohar @ 2020-04-21 20:31 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: James Bottomley, Omar Sandoval, Peter Huewe, Jason Gunthorpe,
	linux-integrity

On Tue, 2020-04-21 at 23:25 +0300, Jarkko Sakkinen wrote:
> On Tue, Apr 21, 2020 at 10:36:04AM -0400, Mimi Zohar wrote:
> > On Mon, 2020-04-20 at 15:28 -0700, James Bottomley wrote:
> > > On Mon, 2020-04-20 at 23:46 +0300, Jarkko Sakkinen wrote:
> > 
> > <snip>
> > 
> > > But more seriously: Nayna Jain did a series of patches improving the
> > > time it takes to poll the TPM for operations precisely because the TPM
> > > PCR extend was going so slowly:
> > > 
> > > https://lore.kernel.org/linux-integrity/20180516055125.5685-1-nayna@linux.vnet.ibm.com/
> > 
> > The original reason for us needing to improve the TPM performance was
> > due to the kernel scheduler change.  Refer to commit a233a0289cf9
> > ("tpm: msleep() delays - replace with usleep_range() in i2c nuvoton
> > driver").  That scheduler change prevented systems from booting.
> >  Bisecting the kernel to figure out the problem wasn't very
> > productive.
> > 
> > At least any TPM changes that affect the TPM performance really need
> > to take into account IMA requirements.
> 
> Thanks Mimi.
> 
> With my dynamic proposal it would work as it works now for system
> where it worked anyway, and would fix the systems where timeouts
> were too short.

Sure, but please remember this thread started out addressing an STM
TPM bug, not James' timeouts.  This part of the thread should be re-
named.

Mimi


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] tpm_tis: work around status register bug in STMicroelectronics TPM
  2020-04-21 20:23                 ` Jarkko Sakkinen
@ 2020-04-21 22:08                   ` James Bottomley
  0 siblings, 0 replies; 24+ messages in thread
From: James Bottomley @ 2020-04-21 22:08 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: zohar, Omar Sandoval, Peter Huewe, Jason Gunthorpe, linux-integrity

On Tue, 2020-04-21 at 23:23 +0300, Jarkko Sakkinen wrote:
> On Mon, Apr 20, 2020 at 03:28:06PM -0700, James Bottomley wrote:
> > On Mon, 2020-04-20 at 23:46 +0300, Jarkko Sakkinen wrote:
[...]
> > > Does it matter?
> > 
> > Not if you're the one telling Mimi ... and I'm at least 1 mile from
> > the blast radius.
> > 
> > But more seriously: Nayna Jain did a series of patches improving
> > the time it takes to poll the TPM for operations precisely because
> > the TPM PCR extend was going so slowly:
> > 
> > https://lore.kernel.org/linux-integrity/20180516055125.5685-1-nayna
> > @linux.vnet.ibm.com/
> > 
> > I also reported the issue shortly after that patch was integrated,
> > but everyone seemed to think it was just a problem with my TPM chip
> > (it's an early Nuvoton field upgraded to 2.0):
> > 
> > https://lore.kernel.org/linux-integrity/1531328689.3260.8.camel@Han
> > senPartnership.com/
> 
> What if we make it dynamic? I.e. if the init code gets -ETIME, then
> increase the timeouts?

The problem is detecting that you need the updated timeouts.  My tpm
doesn't fail in init.  In fact it manages quite a healthy amount of key
operations before going offline.  I'm a heavy TPM key user: I use RSA
keys for two VPNs, all my ssh keys and so on I got tired of the time
RSA takes, so my gpg keys are elliptic curve, but they're the only
ones.  By the time my TPM fails, I've usually at least booted my
desktop, which means several tens of RSA operations have already
happened in the TPM.

James


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] tpm_tis: work around status register bug in STMicroelectronics TPM
  2020-04-15 23:51 ` James Bottomley
  2020-04-16  0:16   ` Omar Sandoval
  2020-04-16 17:09   ` Jarkko Sakkinen
@ 2020-08-27 15:24   ` Jason Andryuk
  2020-08-28 23:18     ` Jarkko Sakkinen
  2 siblings, 1 reply; 24+ messages in thread
From: Jason Andryuk @ 2020-08-27 15:24 UTC (permalink / raw)
  To: James Bottomley, Omar Sandoval, Peter Huewe, Jarkko Sakkinen,
	Jason Gunthorpe, linux-integrity

James Bottomley wrote:
>On Wed, 2020-04-15 at 15:45 -0700, Omar Sandoval wrote:
>> From: Omar Sandoval <osandov@fb.com>
>> 
>> We've encountered a particular model of STMicroelectronics TPM that
>> transiently returns a bad value in the status register. This causes
>> the kernel to believe that the TPM is ready to receive a command when
>> it actually isn't, which in turn causes the send to time out in
>> get_burstcount(). In testing, reading the status register one extra
>> time convinces the TPM to return a valid value.
>
>Interesting, I've got a very early upgradeable nuvoton that seems to be
>behaving like this.
>
>> Signed-off-by: Omar Sandoval <osandov@fb.com>
>> ---
>>  drivers/char/tpm/tpm_tis_core.c | 12 ++++++++++++
>>  1 file changed, 12 insertions(+)
>> 
>> diff --git a/drivers/char/tpm/tpm_tis_core.c
>> b/drivers/char/tpm/tpm_tis_core.c
>> index 27c6ca031e23..277a21027fc7 100644
>> --- a/drivers/char/tpm/tpm_tis_core.c
>> +++ b/drivers/char/tpm/tpm_tis_core.c
>> @@ -238,6 +238,18 @@ static u8 tpm_tis_status(struct tpm_chip *chip)
>>  	rc = tpm_tis_read8(priv, TPM_STS(priv->locality), &status);
>>  	if (rc < 0)
>>  		return 0;
>> +	/*
>> +	 * Some STMicroelectronics TPMs have a bug where the status
>> register is
>> +	 * sometimes bogus (all 1s) if read immediately after the
>> access
>> +	 * register is written to. Bits 0, 1, and 5 are always
>> supposed to read
>> +	 * as 0, so this is clearly invalid. Reading the register a
>> second time
>> +	 * returns a valid value.
>> +	 */
>> +	if (unlikely(status == 0xff)) {
>> +		rc = tpm_tis_read8(priv, TPM_STS(priv->locality),
>> &status);
>> +		if (rc < 0)
>> +			return 0;
>> +	}
>
>You theorize that your case is fixed by the second read, but what if it
>isn't and the second read also returns 0xff?  Shouldn't we have a line
>here saying
>
>if (unlikely(status == 0xff))
>	status = 0;
>
>So if we get a second 0xff we just pretend the thing isn't ready?

Thanks for the fix, Omar!

I tried the patch and it helps with STM TPM2 issues where commands fail
with the kernel reporting:
tpm tpm0: Unable to read burstcount
tpm tpm0: tpm_try_transmit: send(): error -16

My testing was with 5.4, and I'd like to see this CC-ed stable.

When trying to diagnose the issue before finding this patch, I found it
was timing sensitive.  I was seeing failures in the OpenXT installer.
The system is basically idle when issuing TPM commands which frequently
failed.  The same hardware booted into a Fedora Live USB image didn't
have any TPM command failures.  One notable difference between the two
is Fedora is CONFIG_PREEMPT=y and OpenXT is CONFIG_PREEMPT_NONE=y.
Switching OpenXT to PREEMPT=y helped some, but there were still some
issues with commands failing.  The second interesting thing was running tpm
commands in OpenXT under trace-cmd let them succeed.  I guess that was enough
to throw the timing off.

Anyway, I'd like to see this patch applied, please.

Thanks,
Jason

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] tpm_tis: work around status register bug in STMicroelectronics TPM
  2020-08-27 15:24   ` Jason Andryuk
@ 2020-08-28 23:18     ` Jarkko Sakkinen
  2020-08-29  0:12       ` Jason Andryuk
  0 siblings, 1 reply; 24+ messages in thread
From: Jarkko Sakkinen @ 2020-08-28 23:18 UTC (permalink / raw)
  To: Jason Andryuk
  Cc: James Bottomley, Omar Sandoval, Peter Huewe, Jason Gunthorpe,
	linux-integrity

On Thu, Aug 27, 2020 at 11:24:45AM -0400, Jason Andryuk wrote:
> James Bottomley wrote:
> >On Wed, 2020-04-15 at 15:45 -0700, Omar Sandoval wrote:
> >> From: Omar Sandoval <osandov@fb.com>
> >> 
> >> We've encountered a particular model of STMicroelectronics TPM that
> >> transiently returns a bad value in the status register. This causes
> >> the kernel to believe that the TPM is ready to receive a command when
> >> it actually isn't, which in turn causes the send to time out in
> >> get_burstcount(). In testing, reading the status register one extra
> >> time convinces the TPM to return a valid value.
> >
> >Interesting, I've got a very early upgradeable nuvoton that seems to be
> >behaving like this.
> >
> >> Signed-off-by: Omar Sandoval <osandov@fb.com>
> >> ---
> >>  drivers/char/tpm/tpm_tis_core.c | 12 ++++++++++++
> >>  1 file changed, 12 insertions(+)
> >> 
> >> diff --git a/drivers/char/tpm/tpm_tis_core.c
> >> b/drivers/char/tpm/tpm_tis_core.c
> >> index 27c6ca031e23..277a21027fc7 100644
> >> --- a/drivers/char/tpm/tpm_tis_core.c
> >> +++ b/drivers/char/tpm/tpm_tis_core.c
> >> @@ -238,6 +238,18 @@ static u8 tpm_tis_status(struct tpm_chip *chip)
> >>  	rc = tpm_tis_read8(priv, TPM_STS(priv->locality), &status);
> >>  	if (rc < 0)
> >>  		return 0;
> >> +	/*
> >> +	 * Some STMicroelectronics TPMs have a bug where the status
> >> register is
> >> +	 * sometimes bogus (all 1s) if read immediately after the
> >> access
> >> +	 * register is written to. Bits 0, 1, and 5 are always
> >> supposed to read
> >> +	 * as 0, so this is clearly invalid. Reading the register a
> >> second time
> >> +	 * returns a valid value.
> >> +	 */
> >> +	if (unlikely(status == 0xff)) {
> >> +		rc = tpm_tis_read8(priv, TPM_STS(priv->locality),
> >> &status);
> >> +		if (rc < 0)
> >> +			return 0;
> >> +	}
> >
> >You theorize that your case is fixed by the second read, but what if it
> >isn't and the second read also returns 0xff?  Shouldn't we have a line
> >here saying
> >
> >if (unlikely(status == 0xff))
> >	status = 0;
> >
> >So if we get a second 0xff we just pretend the thing isn't ready?
> 
> Thanks for the fix, Omar!
> 
> I tried the patch and it helps with STM TPM2 issues where commands fail
> with the kernel reporting:
> tpm tpm0: Unable to read burstcount
> tpm tpm0: tpm_try_transmit: send(): error -16
> 
> My testing was with 5.4, and I'd like to see this CC-ed stable.
> 
> When trying to diagnose the issue before finding this patch, I found it
> was timing sensitive.  I was seeing failures in the OpenXT installer.
> The system is basically idle when issuing TPM commands which frequently
> failed.  The same hardware booted into a Fedora Live USB image didn't
> have any TPM command failures.  One notable difference between the two
> is Fedora is CONFIG_PREEMPT=y and OpenXT is CONFIG_PREEMPT_NONE=y.
> Switching OpenXT to PREEMPT=y helped some, but there were still some
> issues with commands failing.  The second interesting thing was running tpm
> commands in OpenXT under trace-cmd let them succeed.  I guess that was enough
> to throw the timing off.
> 
> Anyway, I'd like to see this patch applied, please.
> 
> Thanks,
> Jason

There was v2 sent after this:

https://patchwork.kernel.org/patch/11492125/

Unfortunately it lacks changelog. What was changed between v1 and v2?
Why v3 has not been sent yet? I see a discussion with no final
conclusion.

/Jarkko

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] tpm_tis: work around status register bug in STMicroelectronics TPM
  2020-08-28 23:18     ` Jarkko Sakkinen
@ 2020-08-29  0:12       ` Jason Andryuk
  2020-08-31 13:55         ` Jarkko Sakkinen
  2020-09-04 12:03         ` Jarkko Sakkinen
  0 siblings, 2 replies; 24+ messages in thread
From: Jason Andryuk @ 2020-08-29  0:12 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: James Bottomley, Omar Sandoval, Peter Huewe, Jason Gunthorpe,
	linux-integrity

On Fri, Aug 28, 2020 at 7:18 PM Jarkko Sakkinen
<jarkko.sakkinen@linux.intel.com> wrote:
>
> On Thu, Aug 27, 2020 at 11:24:45AM -0400, Jason Andryuk wrote:
> > James Bottomley wrote:
> > >On Wed, 2020-04-15 at 15:45 -0700, Omar Sandoval wrote:
> > >> From: Omar Sandoval <osandov@fb.com>
> > >>
> > >> We've encountered a particular model of STMicroelectronics TPM that
> > >> transiently returns a bad value in the status register. This causes
> > >> the kernel to believe that the TPM is ready to receive a command when
> > >> it actually isn't, which in turn causes the send to time out in
> > >> get_burstcount(). In testing, reading the status register one extra
> > >> time convinces the TPM to return a valid value.
> > >
> > >Interesting, I've got a very early upgradeable nuvoton that seems to be
> > >behaving like this.
> > >
> > >> Signed-off-by: Omar Sandoval <osandov@fb.com>
> > >> ---
> > >>  drivers/char/tpm/tpm_tis_core.c | 12 ++++++++++++
> > >>  1 file changed, 12 insertions(+)
> > >>
> > >> diff --git a/drivers/char/tpm/tpm_tis_core.c
> > >> b/drivers/char/tpm/tpm_tis_core.c
> > >> index 27c6ca031e23..277a21027fc7 100644
> > >> --- a/drivers/char/tpm/tpm_tis_core.c
> > >> +++ b/drivers/char/tpm/tpm_tis_core.c
> > >> @@ -238,6 +238,18 @@ static u8 tpm_tis_status(struct tpm_chip *chip)
> > >>    rc = tpm_tis_read8(priv, TPM_STS(priv->locality), &status);
> > >>    if (rc < 0)
> > >>            return 0;
> > >> +  /*
> > >> +   * Some STMicroelectronics TPMs have a bug where the status
> > >> register is
> > >> +   * sometimes bogus (all 1s) if read immediately after the
> > >> access
> > >> +   * register is written to. Bits 0, 1, and 5 are always
> > >> supposed to read
> > >> +   * as 0, so this is clearly invalid. Reading the register a
> > >> second time
> > >> +   * returns a valid value.
> > >> +   */
> > >> +  if (unlikely(status == 0xff)) {
> > >> +          rc = tpm_tis_read8(priv, TPM_STS(priv->locality),
> > >> &status);
> > >> +          if (rc < 0)
> > >> +                  return 0;
> > >> +  }
> > >
> > >You theorize that your case is fixed by the second read, but what if it
> > >isn't and the second read also returns 0xff?  Shouldn't we have a line
> > >here saying
> > >
> > >if (unlikely(status == 0xff))
> > >     status = 0;
> > >
> > >So if we get a second 0xff we just pretend the thing isn't ready?
> >
> > Thanks for the fix, Omar!
> >
> > I tried the patch and it helps with STM TPM2 issues where commands fail
> > with the kernel reporting:
> > tpm tpm0: Unable to read burstcount
> > tpm tpm0: tpm_try_transmit: send(): error -16
> >
> > My testing was with 5.4, and I'd like to see this CC-ed stable.
> >
> > When trying to diagnose the issue before finding this patch, I found it
> > was timing sensitive.  I was seeing failures in the OpenXT installer.
> > The system is basically idle when issuing TPM commands which frequently
> > failed.  The same hardware booted into a Fedora Live USB image didn't
> > have any TPM command failures.  One notable difference between the two
> > is Fedora is CONFIG_PREEMPT=y and OpenXT is CONFIG_PREEMPT_NONE=y.
> > Switching OpenXT to PREEMPT=y helped some, but there were still some
> > issues with commands failing.  The second interesting thing was running tpm
> > commands in OpenXT under trace-cmd let them succeed.  I guess that was enough
> > to throw the timing off.
> >
> > Anyway, I'd like to see this patch applied, please.
> >
> > Thanks,
> > Jason
>
> There was v2 sent after this:
>
> https://patchwork.kernel.org/patch/11492125/

Thanks!  That one didn't come up in a search for STM on lore.kernel.org.

> Unfortunately it lacks changelog. What was changed between v1 and v2?
> Why v3 has not been sent yet? I see a discussion with no final
> conclusion.

Looks like v2 added James's suggestion with a comment (sorry the
formating is off):

+ /*
+ * The status is somehow still bad. This hasn't been observed in
+ * practice, but clear it just in case so that it doesn't appear
+ * to be ready.
+ */
+ if (unlikely(status == 0xff))
+         status = 0;

But, yes, the decision on the alternate approach is unresolved.

Thanks again,
Jason

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] tpm_tis: work around status register bug in STMicroelectronics TPM
  2020-08-29  0:12       ` Jason Andryuk
@ 2020-08-31 13:55         ` Jarkko Sakkinen
  2020-09-04 12:03         ` Jarkko Sakkinen
  1 sibling, 0 replies; 24+ messages in thread
From: Jarkko Sakkinen @ 2020-08-31 13:55 UTC (permalink / raw)
  To: Jason Andryuk
  Cc: James Bottomley, Omar Sandoval, Peter Huewe, Jason Gunthorpe,
	linux-integrity

On Fri, Aug 28, 2020 at 08:12:55PM -0400, Jason Andryuk wrote:
> On Fri, Aug 28, 2020 at 7:18 PM Jarkko Sakkinen
> <jarkko.sakkinen@linux.intel.com> wrote:
> >
> > On Thu, Aug 27, 2020 at 11:24:45AM -0400, Jason Andryuk wrote:
> > > James Bottomley wrote:
> > > >On Wed, 2020-04-15 at 15:45 -0700, Omar Sandoval wrote:
> > > >> From: Omar Sandoval <osandov@fb.com>
> > > >>
> > > >> We've encountered a particular model of STMicroelectronics TPM that
> > > >> transiently returns a bad value in the status register. This causes
> > > >> the kernel to believe that the TPM is ready to receive a command when
> > > >> it actually isn't, which in turn causes the send to time out in
> > > >> get_burstcount(). In testing, reading the status register one extra
> > > >> time convinces the TPM to return a valid value.
> > > >
> > > >Interesting, I've got a very early upgradeable nuvoton that seems to be
> > > >behaving like this.
> > > >
> > > >> Signed-off-by: Omar Sandoval <osandov@fb.com>
> > > >> ---
> > > >>  drivers/char/tpm/tpm_tis_core.c | 12 ++++++++++++
> > > >>  1 file changed, 12 insertions(+)
> > > >>
> > > >> diff --git a/drivers/char/tpm/tpm_tis_core.c
> > > >> b/drivers/char/tpm/tpm_tis_core.c
> > > >> index 27c6ca031e23..277a21027fc7 100644
> > > >> --- a/drivers/char/tpm/tpm_tis_core.c
> > > >> +++ b/drivers/char/tpm/tpm_tis_core.c
> > > >> @@ -238,6 +238,18 @@ static u8 tpm_tis_status(struct tpm_chip *chip)
> > > >>    rc = tpm_tis_read8(priv, TPM_STS(priv->locality), &status);
> > > >>    if (rc < 0)
> > > >>            return 0;
> > > >> +  /*
> > > >> +   * Some STMicroelectronics TPMs have a bug where the status
> > > >> register is
> > > >> +   * sometimes bogus (all 1s) if read immediately after the
> > > >> access
> > > >> +   * register is written to. Bits 0, 1, and 5 are always
> > > >> supposed to read
> > > >> +   * as 0, so this is clearly invalid. Reading the register a
> > > >> second time
> > > >> +   * returns a valid value.
> > > >> +   */
> > > >> +  if (unlikely(status == 0xff)) {
> > > >> +          rc = tpm_tis_read8(priv, TPM_STS(priv->locality),
> > > >> &status);
> > > >> +          if (rc < 0)
> > > >> +                  return 0;
> > > >> +  }
> > > >
> > > >You theorize that your case is fixed by the second read, but what if it
> > > >isn't and the second read also returns 0xff?  Shouldn't we have a line
> > > >here saying
> > > >
> > > >if (unlikely(status == 0xff))
> > > >     status = 0;
> > > >
> > > >So if we get a second 0xff we just pretend the thing isn't ready?
> > >
> > > Thanks for the fix, Omar!
> > >
> > > I tried the patch and it helps with STM TPM2 issues where commands fail
> > > with the kernel reporting:
> > > tpm tpm0: Unable to read burstcount
> > > tpm tpm0: tpm_try_transmit: send(): error -16
> > >
> > > My testing was with 5.4, and I'd like to see this CC-ed stable.
> > >
> > > When trying to diagnose the issue before finding this patch, I found it
> > > was timing sensitive.  I was seeing failures in the OpenXT installer.
> > > The system is basically idle when issuing TPM commands which frequently
> > > failed.  The same hardware booted into a Fedora Live USB image didn't
> > > have any TPM command failures.  One notable difference between the two
> > > is Fedora is CONFIG_PREEMPT=y and OpenXT is CONFIG_PREEMPT_NONE=y.
> > > Switching OpenXT to PREEMPT=y helped some, but there were still some
> > > issues with commands failing.  The second interesting thing was running tpm
> > > commands in OpenXT under trace-cmd let them succeed.  I guess that was enough
> > > to throw the timing off.
> > >
> > > Anyway, I'd like to see this patch applied, please.
> > >
> > > Thanks,
> > > Jason
> >
> > There was v2 sent after this:
> >
> > https://patchwork.kernel.org/patch/11492125/
> 
> Thanks!  That one didn't come up in a search for STM on lore.kernel.org.
> 
> > Unfortunately it lacks changelog. What was changed between v1 and v2?
> > Why v3 has not been sent yet? I see a discussion with no final
> > conclusion.
> 
> Looks like v2 added James's suggestion with a comment (sorry the
> formating is off):
> 
> + /*
> + * The status is somehow still bad. This hasn't been observed in
> + * practice, but clear it just in case so that it doesn't appear
> + * to be ready.
> + */
> + if (unlikely(status == 0xff))
> +         status = 0;
> 
> But, yes, the decision on the alternate approach is unresolved.

Ya, has been a while so had to check.

Also, looking at the discussion in the past I'm very confused is this
problem related to particular models of a particular vendor or does this
occur among multiple vendors.

Also, the inline comment needs a rewrite. It literally says that this
has not been observed in practice, which literally tells me not to
merge this one. Also the explanation "status is somewhow bad" does not
really make any sense. I don't know what it means something is "somehow
bad".



This must be clarified in the commit message.

> Thanks again,
> Jason

/Jarkko

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] tpm_tis: work around status register bug in STMicroelectronics TPM
  2020-08-29  0:12       ` Jason Andryuk
  2020-08-31 13:55         ` Jarkko Sakkinen
@ 2020-09-04 12:03         ` Jarkko Sakkinen
  1 sibling, 0 replies; 24+ messages in thread
From: Jarkko Sakkinen @ 2020-09-04 12:03 UTC (permalink / raw)
  To: Jason Andryuk
  Cc: James Bottomley, Omar Sandoval, Peter Huewe, Jason Gunthorpe,
	linux-integrity

On Fri, Aug 28, 2020 at 08:12:55PM -0400, Jason Andryuk wrote:
> On Fri, Aug 28, 2020 at 7:18 PM Jarkko Sakkinen
> <jarkko.sakkinen@linux.intel.com> wrote:
> >
> > On Thu, Aug 27, 2020 at 11:24:45AM -0400, Jason Andryuk wrote:
> > > James Bottomley wrote:
> > > >On Wed, 2020-04-15 at 15:45 -0700, Omar Sandoval wrote:
> > > >> From: Omar Sandoval <osandov@fb.com>
> > > >>
> > > >> We've encountered a particular model of STMicroelectronics TPM that
> > > >> transiently returns a bad value in the status register. This causes
> > > >> the kernel to believe that the TPM is ready to receive a command when
> > > >> it actually isn't, which in turn causes the send to time out in
> > > >> get_burstcount(). In testing, reading the status register one extra
> > > >> time convinces the TPM to return a valid value.
> > > >
> > > >Interesting, I've got a very early upgradeable nuvoton that seems to be
> > > >behaving like this.
> > > >
> > > >> Signed-off-by: Omar Sandoval <osandov@fb.com>
> > > >> ---
> > > >>  drivers/char/tpm/tpm_tis_core.c | 12 ++++++++++++
> > > >>  1 file changed, 12 insertions(+)
> > > >>
> > > >> diff --git a/drivers/char/tpm/tpm_tis_core.c
> > > >> b/drivers/char/tpm/tpm_tis_core.c
> > > >> index 27c6ca031e23..277a21027fc7 100644
> > > >> --- a/drivers/char/tpm/tpm_tis_core.c
> > > >> +++ b/drivers/char/tpm/tpm_tis_core.c
> > > >> @@ -238,6 +238,18 @@ static u8 tpm_tis_status(struct tpm_chip *chip)
> > > >>    rc = tpm_tis_read8(priv, TPM_STS(priv->locality), &status);
> > > >>    if (rc < 0)
> > > >>            return 0;
> > > >> +  /*
> > > >> +   * Some STMicroelectronics TPMs have a bug where the status
> > > >> register is
> > > >> +   * sometimes bogus (all 1s) if read immediately after the
> > > >> access
> > > >> +   * register is written to. Bits 0, 1, and 5 are always
> > > >> supposed to read
> > > >> +   * as 0, so this is clearly invalid. Reading the register a
> > > >> second time
> > > >> +   * returns a valid value.
> > > >> +   */
> > > >> +  if (unlikely(status == 0xff)) {
> > > >> +          rc = tpm_tis_read8(priv, TPM_STS(priv->locality),
> > > >> &status);
> > > >> +          if (rc < 0)
> > > >> +                  return 0;
> > > >> +  }
> > > >
> > > >You theorize that your case is fixed by the second read, but what if it
> > > >isn't and the second read also returns 0xff?  Shouldn't we have a line
> > > >here saying
> > > >
> > > >if (unlikely(status == 0xff))
> > > >     status = 0;
> > > >
> > > >So if we get a second 0xff we just pretend the thing isn't ready?
> > >
> > > Thanks for the fix, Omar!
> > >
> > > I tried the patch and it helps with STM TPM2 issues where commands fail
> > > with the kernel reporting:
> > > tpm tpm0: Unable to read burstcount
> > > tpm tpm0: tpm_try_transmit: send(): error -16
> > >
> > > My testing was with 5.4, and I'd like to see this CC-ed stable.
> > >
> > > When trying to diagnose the issue before finding this patch, I found it
> > > was timing sensitive.  I was seeing failures in the OpenXT installer.
> > > The system is basically idle when issuing TPM commands which frequently
> > > failed.  The same hardware booted into a Fedora Live USB image didn't
> > > have any TPM command failures.  One notable difference between the two
> > > is Fedora is CONFIG_PREEMPT=y and OpenXT is CONFIG_PREEMPT_NONE=y.
> > > Switching OpenXT to PREEMPT=y helped some, but there were still some
> > > issues with commands failing.  The second interesting thing was running tpm
> > > commands in OpenXT under trace-cmd let them succeed.  I guess that was enough
> > > to throw the timing off.
> > >
> > > Anyway, I'd like to see this patch applied, please.
> > >
> > > Thanks,
> > > Jason
> >
> > There was v2 sent after this:
> >
> > https://patchwork.kernel.org/patch/11492125/
> 
> Thanks!  That one didn't come up in a search for STM on lore.kernel.org.
> 
> > Unfortunately it lacks changelog. What was changed between v1 and v2?
> > Why v3 has not been sent yet? I see a discussion with no final
> > conclusion.
> 
> Looks like v2 added James's suggestion with a comment (sorry the
> formating is off):
> 
> + /*
> + * The status is somehow still bad. This hasn't been observed in
> + * practice, but clear it just in case so that it doesn't appear
> + * to be ready.
> + */
> + if (unlikely(status == 0xff))
> +         status = 0;
> 
> But, yes, the decision on the alternate approach is unresolved.
> 
> Thanks again,
> Jason

I'm happy to apply this patch as soon as there is either v3 or some
resolution to v2 discussion.

/Jarkko

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2020-09-04 12:03 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-15 22:45 [PATCH] tpm_tis: work around status register bug in STMicroelectronics TPM Omar Sandoval
2020-04-15 23:51 ` James Bottomley
2020-04-16  0:16   ` Omar Sandoval
2020-04-16  0:24     ` Omar Sandoval
2020-04-16 18:02       ` James Bottomley
2020-04-17 23:55         ` Jarkko Sakkinen
2020-04-18  0:12           ` James Bottomley
2020-04-20 20:46             ` Jarkko Sakkinen
2020-04-20 22:28               ` James Bottomley
2020-04-21 14:36                 ` Mimi Zohar
2020-04-21 20:25                   ` Jarkko Sakkinen
2020-04-21 20:31                     ` Mimi Zohar
2020-04-21 20:23                 ` Jarkko Sakkinen
2020-04-21 22:08                   ` James Bottomley
2020-04-16 17:09   ` Jarkko Sakkinen
2020-04-16 17:56     ` James Bottomley
2020-08-27 15:24   ` Jason Andryuk
2020-08-28 23:18     ` Jarkko Sakkinen
2020-08-29  0:12       ` Jason Andryuk
2020-08-31 13:55         ` Jarkko Sakkinen
2020-09-04 12:03         ` Jarkko Sakkinen
2020-04-16 17:08 ` Jarkko Sakkinen
2020-04-16 18:54   ` Omar Sandoval
2020-04-17 23:54     ` Jarkko Sakkinen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.