linux-integrity.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] Fix Atmel TPM crash caused by too frequent queries
@ 2020-09-14  6:13 Hao Wu
  2020-09-14  6:17 ` Greg KH
  2020-09-15  2:52 ` Hao Wu
  0 siblings, 2 replies; 51+ messages in thread
From: Hao Wu @ 2020-09-14  6:13 UTC (permalink / raw)
  To: peterhuewe, jarkko.sakkinen, jgg, arnd, gregkh, hamza, james.l.morris
  Cc: linux-integrity, pmenzel, kgold, seungyeop.han, shrihari.kalkar,
	hao.wu, anish.jhaveri

Since kernel 4.14, we fixed the TPM sleep logic
from msleep to usleep_range, so that the TPM
sleeps exactly with TPM_TIMEOUT (=5ms) afterward.
Before that fix, msleep(5) actually sleeps for
around 15ms.
The fix is https://github.com/torvalds/linux/commit/9f3fc7bcddcb51234e23494531f93ab60475e1c3

That fix uncovered that the TPM_TIMEOUT was not properly
set previously. We recently found the TPM driver in kernel 4.14+
(including 5.9-rc4) crashes Atmel TPM chips with
too frequent TPM queries.

The TPM crash signature is
```
$ tpm_sealdata -z
Tspi_Key_LoadKey failed: 0x00001087 - layer=tddl, code=0087 (135), I/O error

$ sudo dmesg | grep tpm0
[59154.665549] tpm tpm0: tpm_try_transmit: send(): error -62
[59154.809532] tpm tpm0: tpm_try_transmit: send(): error -62
```

From the error code "-62", it looks similar to another bug
https://patchwork.kernel.org/patch/10520247/
where the "TPM_TIMEOUT_USECS_MAX" and "TPM_TIMEOUT_USEC_MIN"
is too small, which causes TPM get queried too frequently,
and thus crashes.

This patch fix the TPM_TIMEOUT to 15ms which was
the actual timeout TPM chips use before the fix
from msleep to usleep_range. Thus fixed the crash.

Test Plan:
- Run fixed kernel on system with Atmel TPM chips
  and ensure crash does not happen
- Run fixed kernel on system with other TPM chips
  (IFX / WEC / STM) ensure not breakages from tpm-tool
---
 drivers/char/tpm/tpm.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/char/tpm/tpm.h b/drivers/char/tpm/tpm.h
index 947d1db0a5cc..73259ac0a997 100644
--- a/drivers/char/tpm/tpm.h
+++ b/drivers/char/tpm/tpm.h
@@ -37,7 +37,7 @@
 #define TPM_RETRY		50
 
 enum tpm_timeout {
-	TPM_TIMEOUT = 5,	/* msecs */
+	TPM_TIMEOUT = 15,	/* msecs */
 	TPM_TIMEOUT_RETRY = 100, /* msecs */
 	TPM_TIMEOUT_RANGE_US = 300,	/* usecs */
 	TPM_TIMEOUT_POLL = 1,	/* msecs */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 51+ messages in thread
* [PATCH] Fix Atmel TPM crash caused by too frequent queries
@ 2020-09-26 22:31 Hao Wu
  2020-09-26 22:57 ` James Bottomley
  0 siblings, 1 reply; 51+ messages in thread
From: Hao Wu @ 2020-09-26 22:31 UTC (permalink / raw)
  To: peterhuewe, jarkko.sakkinen, jgg, arnd, gregkh, hamza,
	james.l.morris, nayna, why2jjj.linux, zohar
  Cc: linux-integrity, pmenzel, kgold, seungyeop.han, shrihari.kalkar,
	anish.jhaveri, hao.wu

Since kernel 4.14, we fixed the TPM sleep logic
from msleep to usleep_range, so that the TPM
sleeps exactly with TPM_TIMEOUT (=5ms) afterward.
Before that fix, msleep(5) actually sleeps for
around 15ms.
The fix is https://github.com/torvalds/linux/commit/9f3fc7bcddcb51234e23494531f93ab60475e1c3

That fix uncovered that the TPM_TIMEOUT was not properly
set previously. We recently found the TPM driver in kernel 4.14+
(including 5.9-rc4) crashes Atmel TPM chips with
too frequent TPM queries.

The TPM crash signature is
```
$ tpm_sealdata -z
Tspi_Key_LoadKey failed: 0x00001087 - layer=tddl, code=0087 (135), I/O error

$ sudo dmesg | grep tpm0
[59154.665549] tpm tpm0: tpm_try_transmit: send(): error -62
[59154.809532] tpm tpm0: tpm_try_transmit: send(): error -62
```

From the error code "-62", it looks similar to another bug
https://patchwork.kernel.org/patch/10520247/
where the "TPM_TIMEOUT_USECS_MAX" and "TPM_TIMEOUT_USEC_MIN"
is too small, which causes TPM get queried too frequently,
and thus crashes.

We finally found the root cause is related to
the sleep timeout used in `wait_for_tpm_stat`

In the 4.16, commit
https://github.com/torvalds/linux/commit/cf151a9a44d52a63332e8e926234574fe5a5d784
uses `TPM_POLL_SLEEP` instead of TPM_TIMEOUT for `wait_for_tpm_stat`
and set `TPM_POLL_SLEEP` (1ms).

Since 4.18 commits
https://github.com/torvalds/linux/commit/59f5a6b07f6434efac0057dc2f303a96b871811b
https://github.com/torvalds/linux/commit/424eaf910c329ab06ad03a527ef45dcf6a328f00
further reduced the timeout in wait_for_tpm_stat to less than 1ms.

Our patch here defines a `TPM_TIMEOUT_WAIT_STAT` (15ms) just for
`wait_for_tpm_stat the` to fix the crash in Atmel chips,
but not introduce unnecessary performance regression
in other workflows.
15ms is the actual timeout TPM chips use before the 4.14 fix
from msleep to usleep_range. Thus fixed the crash.

Exploring smaller `TPM_TIMEOUT_WAIT_STAT` should be a separate
thing to revisit later. This patch meant to fix the regression
introduced since 4.14

Test Plan:
- Run fixed kernel on system with Atmel TPM chips
  and ensure crash does not happen
- Run fixed kernel on system with other TPM chips
  (IFX / WEC / STM) ensure not breakages from tpm-tool

Signed-off-by: Hao Wu <hao.wu@rubrik.com>
---
 drivers/char/tpm/tpm.h          | 1 +
 drivers/char/tpm/tpm_tis_core.c | 3 +--
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/char/tpm/tpm.h b/drivers/char/tpm/tpm.h
index 947d1db0a5cc..899097ae9756 100644
--- a/drivers/char/tpm/tpm.h
+++ b/drivers/char/tpm/tpm.h
@@ -41,6 +41,7 @@ enum tpm_timeout {
 	TPM_TIMEOUT_RETRY = 100, /* msecs */
 	TPM_TIMEOUT_RANGE_US = 300,	/* usecs */
 	TPM_TIMEOUT_POLL = 1,	/* msecs */
+	TPM_TIMEOUT_WAIT_STAT = 15,      /* msecs */
 	TPM_TIMEOUT_USECS_MIN = 100,      /* usecs */
 	TPM_TIMEOUT_USECS_MAX = 500      /* usecs */
 };
diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
index 65ab1b027949..8aa5eef10c28 100644
--- a/drivers/char/tpm/tpm_tis_core.c
+++ b/drivers/char/tpm/tpm_tis_core.c
@@ -80,8 +80,7 @@ static int wait_for_tpm_stat(struct tpm_chip *chip, u8 mask,
 		}
 	} else {
 		do {
-			usleep_range(TPM_TIMEOUT_USECS_MIN,
-				     TPM_TIMEOUT_USECS_MAX);
+			tpm_msleep(TPM_TIMEOUT_WAIT_STAT);
 			status = chip->ops->status(chip);
 			if ((status & mask) == mask)
 				return 0;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 51+ messages in thread
* [PATCH] Fix Atmel TPM crash caused by too frequent queries
@ 2021-06-20 23:18 Hao Wu
  2021-06-23 13:35 ` Jarkko Sakkinen
  2021-06-24  5:33 ` Hao Wu
  0 siblings, 2 replies; 51+ messages in thread
From: Hao Wu @ 2021-06-20 23:18 UTC (permalink / raw)
  To: hao.wu, shrihari.kalkar, seungyeop.han, anish.jhaveri,
	peterhuewe, jarkko, jgg, linux-integrity, pmenzel, kgold, zohar,
	why2jjj.linux, hamza, gregkh, arnd, nayna, James.Bottomley

This is a fix for the ATMEL TPM crash bug reported in
https://patchwork.kernel.org/project/linux-integrity/patch/20200926223150.109645-1-hao.wu@rubrik.com/

According to the discussions in the original thread,
we don't want to revert the timeout of wait_for_tpm_stat
for non-ATMEL chips, which brings back the performance cost.
For investigation and analysis of why wait_for_tpm_stat
caused the issue, and how the regression was introduced,
please read the original thread above.

Thus the proposed fix here is to only revert the timeout
for ATMEL chips by checking the vendor ID.

Test Plan:
- Run fixed kernel with ATMEL TPM chips and see crash
  has been fixed.
- Run fixed kernel with non-ATMEL TPM chips, and confirm
  the timeout has not been changed.
---
 drivers/char/tpm/tpm.h          |  9 ++++++++-
 drivers/char/tpm/tpm_tis_core.c | 19 +++++++++++++++++--
 include/linux/tpm.h             |  2 ++
 3 files changed, 27 insertions(+), 3 deletions(-)

diff --git a/drivers/char/tpm/tpm.h b/drivers/char/tpm/tpm.h
index 283f78211c3a..bc6aa7f9e119 100644
--- a/drivers/char/tpm/tpm.h
+++ b/drivers/char/tpm/tpm.h
@@ -42,7 +42,9 @@ enum tpm_timeout {
 	TPM_TIMEOUT_RANGE_US = 300,	/* usecs */
 	TPM_TIMEOUT_POLL = 1,	/* msecs */
 	TPM_TIMEOUT_USECS_MIN = 100,      /* usecs */
-	TPM_TIMEOUT_USECS_MAX = 500      /* usecs */
+	TPM_TIMEOUT_USECS_MAX = 500,	/* usecs */
+	TPM_TIMEOUT_WAIT_STAT = 500,	/* usecs */
+	TPM_ATML_TIMEOUT_WAIT_STAT = 15000	/* usecs */
 };
 
 /* TPM addresses */
@@ -189,6 +191,11 @@ static inline void tpm_msleep(unsigned int delay_msec)
 		     delay_msec * 1000);
 };
 
+static inline void tpm_usleep(unsigned int delay_usec)
+{
+	usleep_range(delay_usec - TPM_TIMEOUT_RANGE_US, delay_usec);
+};
+
 int tpm_chip_start(struct tpm_chip *chip);
 void tpm_chip_stop(struct tpm_chip *chip);
 struct tpm_chip *tpm_find_get_ops(struct tpm_chip *chip);
diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
index 55b9d3965ae1..9ddd4edfe1c2 100644
--- a/drivers/char/tpm/tpm_tis_core.c
+++ b/drivers/char/tpm/tpm_tis_core.c
@@ -80,8 +80,12 @@ static int wait_for_tpm_stat(struct tpm_chip *chip, u8 mask,
 		}
 	} else {
 		do {
-			usleep_range(TPM_TIMEOUT_USECS_MIN,
-				     TPM_TIMEOUT_USECS_MAX);
+			if (chip->timeout_wait_stat && 
+				chip->timeout_wait_stat >= TPM_TIMEOUT_WAIT_STAT) {
+				tpm_usleep((unsigned int)(chip->timeout_wait_stat));
+			} else {
+				tpm_usleep((unsigned int)(TPM_TIMEOUT_WAIT_STAT));
+			}
 			status = chip->ops->status(chip);
 			if ((status & mask) == mask)
 				return 0;
@@ -934,6 +938,8 @@ int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq,
 	chip->timeout_b = msecs_to_jiffies(TIS_TIMEOUT_B_MAX);
 	chip->timeout_c = msecs_to_jiffies(TIS_TIMEOUT_C_MAX);
 	chip->timeout_d = msecs_to_jiffies(TIS_TIMEOUT_D_MAX);
+	/* init timeout for wait_for_tpm_stat */
+	chip->timeout_wait_stat = TPM_TIMEOUT_WAIT_STAT;
 	priv->phy_ops = phy_ops;
 	dev_set_drvdata(&chip->dev, priv);
 
@@ -983,6 +989,15 @@ int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq,
 
 	priv->manufacturer_id = vendor;
 
+	switch (priv->manufacturer_id) {
+	case TPM_VID_ATML:
+        /* ATMEL chip needs longer timeout to avoid crash */
+		chip->timeout_wait_stat = TPM_ATML_TIMEOUT_WAIT_STAT;
+		break;
+	default:
+		chip->timeout_wait_stat = TPM_TIMEOUT_WAIT_STAT;
+	}
+
 	rc = tpm_tis_read8(priv, TPM_RID(0), &rid);
 	if (rc < 0)
 		goto out_err;
diff --git a/include/linux/tpm.h b/include/linux/tpm.h
index aa11fe323c56..35f2a0260d76 100644
--- a/include/linux/tpm.h
+++ b/include/linux/tpm.h
@@ -150,6 +150,7 @@ struct tpm_chip {
 	bool timeout_adjusted;
 	unsigned long duration[TPM_NUM_DURATIONS]; /* jiffies */
 	bool duration_adjusted;
+	unsigned long timeout_wait_stat; /* usecs */
 
 	struct dentry *bios_dir[TPM_NUM_EVENT_LOG_FILES];
 
@@ -269,6 +270,7 @@ enum tpm2_cc_attrs {
 #define TPM_VID_INTEL    0x8086
 #define TPM_VID_WINBOND  0x1050
 #define TPM_VID_STM      0x104A
+#define TPM_VID_ATML     0x1114
 
 enum tpm_chip_flags {
 	TPM_CHIP_FLAG_TPM2		= BIT(1),
-- 
2.29.0.vfs.0.0


^ permalink raw reply related	[flat|nested] 51+ messages in thread

end of thread, other threads:[~2021-06-30  4:27 UTC | newest]

Thread overview: 51+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-14  6:13 [PATCH] Fix Atmel TPM crash caused by too frequent queries Hao Wu
2020-09-14  6:17 ` Greg KH
2020-09-15  2:52 ` Hao Wu
2020-09-26 22:31 Hao Wu
2020-09-26 22:57 ` James Bottomley
2020-09-26 23:10   ` Hao Wu
2020-09-27 18:25     ` James Bottomley
2020-09-28  0:11       ` Hao Wu
2020-09-28  0:15         ` Hao Wu
2020-09-28  1:22         ` James Bottomley
2020-09-28  5:59           ` Hao Wu
2020-09-28 22:11             ` James Bottomley
2020-09-29  4:46               ` Hao Wu
2020-09-30  2:16               ` Jarkko Sakkinen
2020-09-30 14:54                 ` James Bottomley
2020-09-30 15:37                   ` Jarkko Sakkinen
2020-09-30 20:48                     ` James Bottomley
2020-09-30 21:09                       ` Jarkko Sakkinen
2020-09-30 22:31                         ` James Bottomley
2020-10-01  1:50                           ` Jarkko Sakkinen
2020-10-01  4:53                             ` James Bottomley
2020-10-01 18:15                               ` Nayna
2020-10-01 18:32                                 ` James Bottomley
2020-10-01 23:04                                   ` Jarkko Sakkinen
2020-10-17  6:11                                     ` Hao Wu
2020-10-18  5:09                                       ` Jarkko Sakkinen
2020-10-18  5:20                                         ` Hao Wu
2020-11-14  4:39                                           ` Hao Wu
2020-11-18 21:11                                             ` Jarkko Sakkinen
2020-11-18 23:23                                               ` Hao Wu
2021-05-09  6:18                                               ` Hao Wu
2021-05-09  6:31                                                 ` Hao Wu
2021-05-10  2:17                                                   ` Mimi Zohar
2021-05-10  3:15                                                     ` Hao Wu
2021-05-10 17:28                                                     ` Jarkko Sakkinen
2020-09-28  1:08       ` Jarkko Sakkinen
2020-09-28  6:03         ` Hao Wu
2020-09-28 14:16           ` Jarkko Sakkinen
2020-09-28 17:49             ` Hao Wu
2020-09-28 19:47               ` Jarkko Sakkinen
2020-09-28 20:27                 ` Hao Wu
2020-09-30  2:11                   ` Jarkko Sakkinen
2020-09-30  3:41                     ` Hao Wu
     [not found]                       ` <EA1EE8F8-F054-4E1B-B830-231398D33CB8@rubrik.com>
2020-10-01 14:16                         ` Mimi Zohar
2021-06-20 23:18 Hao Wu
2021-06-23 13:35 ` Jarkko Sakkinen
2021-06-24  5:49   ` Hao Wu
2021-06-29 20:06     ` Jarkko Sakkinen
2021-06-30  4:27       ` Hao Wu
2021-06-24  5:33 ` Hao Wu
2021-06-29 20:07   ` Jarkko Sakkinen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).