linux-integrity.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] Fix Atmel TPM crash caused by too frequent queries
@ 2021-06-20 23:18 Hao Wu
  2021-06-23 13:35 ` Jarkko Sakkinen
  2021-06-24  5:33 ` Hao Wu
  0 siblings, 2 replies; 91+ messages in thread
From: Hao Wu @ 2021-06-20 23:18 UTC (permalink / raw)
  To: hao.wu, shrihari.kalkar, seungyeop.han, anish.jhaveri,
	peterhuewe, jarkko, jgg, linux-integrity, pmenzel, kgold, zohar,
	why2jjj.linux, hamza, gregkh, arnd, nayna, James.Bottomley

This is a fix for the ATMEL TPM crash bug reported in
https://patchwork.kernel.org/project/linux-integrity/patch/20200926223150.109645-1-hao.wu@rubrik.com/

According to the discussions in the original thread,
we don't want to revert the timeout of wait_for_tpm_stat
for non-ATMEL chips, which brings back the performance cost.
For investigation and analysis of why wait_for_tpm_stat
caused the issue, and how the regression was introduced,
please read the original thread above.

Thus the proposed fix here is to only revert the timeout
for ATMEL chips by checking the vendor ID.

Test Plan:
- Run fixed kernel with ATMEL TPM chips and see crash
  has been fixed.
- Run fixed kernel with non-ATMEL TPM chips, and confirm
  the timeout has not been changed.
---
 drivers/char/tpm/tpm.h          |  9 ++++++++-
 drivers/char/tpm/tpm_tis_core.c | 19 +++++++++++++++++--
 include/linux/tpm.h             |  2 ++
 3 files changed, 27 insertions(+), 3 deletions(-)

diff --git a/drivers/char/tpm/tpm.h b/drivers/char/tpm/tpm.h
index 283f78211c3a..bc6aa7f9e119 100644
--- a/drivers/char/tpm/tpm.h
+++ b/drivers/char/tpm/tpm.h
@@ -42,7 +42,9 @@ enum tpm_timeout {
 	TPM_TIMEOUT_RANGE_US = 300,	/* usecs */
 	TPM_TIMEOUT_POLL = 1,	/* msecs */
 	TPM_TIMEOUT_USECS_MIN = 100,      /* usecs */
-	TPM_TIMEOUT_USECS_MAX = 500      /* usecs */
+	TPM_TIMEOUT_USECS_MAX = 500,	/* usecs */
+	TPM_TIMEOUT_WAIT_STAT = 500,	/* usecs */
+	TPM_ATML_TIMEOUT_WAIT_STAT = 15000	/* usecs */
 };
 
 /* TPM addresses */
@@ -189,6 +191,11 @@ static inline void tpm_msleep(unsigned int delay_msec)
 		     delay_msec * 1000);
 };
 
+static inline void tpm_usleep(unsigned int delay_usec)
+{
+	usleep_range(delay_usec - TPM_TIMEOUT_RANGE_US, delay_usec);
+};
+
 int tpm_chip_start(struct tpm_chip *chip);
 void tpm_chip_stop(struct tpm_chip *chip);
 struct tpm_chip *tpm_find_get_ops(struct tpm_chip *chip);
diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
index 55b9d3965ae1..9ddd4edfe1c2 100644
--- a/drivers/char/tpm/tpm_tis_core.c
+++ b/drivers/char/tpm/tpm_tis_core.c
@@ -80,8 +80,12 @@ static int wait_for_tpm_stat(struct tpm_chip *chip, u8 mask,
 		}
 	} else {
 		do {
-			usleep_range(TPM_TIMEOUT_USECS_MIN,
-				     TPM_TIMEOUT_USECS_MAX);
+			if (chip->timeout_wait_stat && 
+				chip->timeout_wait_stat >= TPM_TIMEOUT_WAIT_STAT) {
+				tpm_usleep((unsigned int)(chip->timeout_wait_stat));
+			} else {
+				tpm_usleep((unsigned int)(TPM_TIMEOUT_WAIT_STAT));
+			}
 			status = chip->ops->status(chip);
 			if ((status & mask) == mask)
 				return 0;
@@ -934,6 +938,8 @@ int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq,
 	chip->timeout_b = msecs_to_jiffies(TIS_TIMEOUT_B_MAX);
 	chip->timeout_c = msecs_to_jiffies(TIS_TIMEOUT_C_MAX);
 	chip->timeout_d = msecs_to_jiffies(TIS_TIMEOUT_D_MAX);
+	/* init timeout for wait_for_tpm_stat */
+	chip->timeout_wait_stat = TPM_TIMEOUT_WAIT_STAT;
 	priv->phy_ops = phy_ops;
 	dev_set_drvdata(&chip->dev, priv);
 
@@ -983,6 +989,15 @@ int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq,
 
 	priv->manufacturer_id = vendor;
 
+	switch (priv->manufacturer_id) {
+	case TPM_VID_ATML:
+        /* ATMEL chip needs longer timeout to avoid crash */
+		chip->timeout_wait_stat = TPM_ATML_TIMEOUT_WAIT_STAT;
+		break;
+	default:
+		chip->timeout_wait_stat = TPM_TIMEOUT_WAIT_STAT;
+	}
+
 	rc = tpm_tis_read8(priv, TPM_RID(0), &rid);
 	if (rc < 0)
 		goto out_err;
diff --git a/include/linux/tpm.h b/include/linux/tpm.h
index aa11fe323c56..35f2a0260d76 100644
--- a/include/linux/tpm.h
+++ b/include/linux/tpm.h
@@ -150,6 +150,7 @@ struct tpm_chip {
 	bool timeout_adjusted;
 	unsigned long duration[TPM_NUM_DURATIONS]; /* jiffies */
 	bool duration_adjusted;
+	unsigned long timeout_wait_stat; /* usecs */
 
 	struct dentry *bios_dir[TPM_NUM_EVENT_LOG_FILES];
 
@@ -269,6 +270,7 @@ enum tpm2_cc_attrs {
 #define TPM_VID_INTEL    0x8086
 #define TPM_VID_WINBOND  0x1050
 #define TPM_VID_STM      0x104A
+#define TPM_VID_ATML     0x1114
 
 enum tpm_chip_flags {
 	TPM_CHIP_FLAG_TPM2		= BIT(1),
-- 
2.29.0.vfs.0.0


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH] Fix Atmel TPM crash caused by too frequent queries
  2021-06-20 23:18 [PATCH] Fix Atmel TPM crash caused by too frequent queries Hao Wu
@ 2021-06-23 13:35 ` Jarkko Sakkinen
  2021-06-24  5:49   ` Hao Wu
  2021-06-24  5:33 ` Hao Wu
  1 sibling, 1 reply; 91+ messages in thread
From: Jarkko Sakkinen @ 2021-06-23 13:35 UTC (permalink / raw)
  To: Hao Wu
  Cc: shrihari.kalkar, seungyeop.han, anish.jhaveri, peterhuewe, jgg,
	linux-integrity, pmenzel, kgold, zohar, why2jjj.linux, hamza,
	gregkh, arnd, nayna, James.Bottomley

On Sun, Jun 20, 2021 at 04:18:09PM -0700, Hao Wu wrote:
> This is a fix for the ATMEL TPM crash bug reported in
> https://patchwork.kernel.org/project/linux-integrity/patch/20200926223150.109645-1-hao.wu@rubrik.com/
> 
> According to the discussions in the original thread,
> we don't want to revert the timeout of wait_for_tpm_stat
> for non-ATMEL chips, which brings back the performance cost.
> For investigation and analysis of why wait_for_tpm_stat
> caused the issue, and how the regression was introduced,
> please read the original thread above.
> 
> Thus the proposed fix here is to only revert the timeout
> for ATMEL chips by checking the vendor ID.
> 
> Test Plan:
> - Run fixed kernel with ATMEL TPM chips and see crash
>   has been fixed.
> - Run fixed kernel with non-ATMEL TPM chips, and confirm
>   the timeout has not been changed.

Please move test plan right before diffstat if you wan to include such,
so that it does not go into the commit log.


> ---
>  drivers/char/tpm/tpm.h          |  9 ++++++++-
>  drivers/char/tpm/tpm_tis_core.c | 19 +++++++++++++++++--
>  include/linux/tpm.h             |  2 ++
>  3 files changed, 27 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/char/tpm/tpm.h b/drivers/char/tpm/tpm.h
> index 283f78211c3a..bc6aa7f9e119 100644
> --- a/drivers/char/tpm/tpm.h
> +++ b/drivers/char/tpm/tpm.h
> @@ -42,7 +42,9 @@ enum tpm_timeout {
>  	TPM_TIMEOUT_RANGE_US = 300,	/* usecs */
>  	TPM_TIMEOUT_POLL = 1,	/* msecs */
>  	TPM_TIMEOUT_USECS_MIN = 100,      /* usecs */
> -	TPM_TIMEOUT_USECS_MAX = 500      /* usecs */
> +	TPM_TIMEOUT_USECS_MAX = 500,	/* usecs */
> +	TPM_TIMEOUT_WAIT_STAT = 500,	/* usecs */
> +	TPM_ATML_TIMEOUT_WAIT_STAT = 15000	/* usecs */
>  };
>  
>  /* TPM addresses */
> @@ -189,6 +191,11 @@ static inline void tpm_msleep(unsigned int delay_msec)
>  		     delay_msec * 1000);
>  };
>  
> +static inline void tpm_usleep(unsigned int delay_usec)
> +{
> +	usleep_range(delay_usec - TPM_TIMEOUT_RANGE_US, delay_usec);
> +};
> +
>  int tpm_chip_start(struct tpm_chip *chip);
>  void tpm_chip_stop(struct tpm_chip *chip);
>  struct tpm_chip *tpm_find_get_ops(struct tpm_chip *chip);
> diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
> index 55b9d3965ae1..9ddd4edfe1c2 100644
> --- a/drivers/char/tpm/tpm_tis_core.c
> +++ b/drivers/char/tpm/tpm_tis_core.c
> @@ -80,8 +80,12 @@ static int wait_for_tpm_stat(struct tpm_chip *chip, u8 mask,
>  		}
>  	} else {
>  		do {
> -			usleep_range(TPM_TIMEOUT_USECS_MIN,
> -				     TPM_TIMEOUT_USECS_MAX);
> +			if (chip->timeout_wait_stat && 
> +				chip->timeout_wait_stat >= TPM_TIMEOUT_WAIT_STAT) {
> +				tpm_usleep((unsigned int)(chip->timeout_wait_stat));
> +			} else {
> +				tpm_usleep((unsigned int)(TPM_TIMEOUT_WAIT_STAT));
> +			}
>  			status = chip->ops->status(chip);
>  			if ((status & mask) == mask)
>  				return 0;
> @@ -934,6 +938,8 @@ int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq,
>  	chip->timeout_b = msecs_to_jiffies(TIS_TIMEOUT_B_MAX);
>  	chip->timeout_c = msecs_to_jiffies(TIS_TIMEOUT_C_MAX);
>  	chip->timeout_d = msecs_to_jiffies(TIS_TIMEOUT_D_MAX);
> +	/* init timeout for wait_for_tpm_stat */
> +	chip->timeout_wait_stat = TPM_TIMEOUT_WAIT_STAT;
>  	priv->phy_ops = phy_ops;
>  	dev_set_drvdata(&chip->dev, priv);
>  
> @@ -983,6 +989,15 @@ int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq,
>  
>  	priv->manufacturer_id = vendor;
>  
> +	switch (priv->manufacturer_id) {
> +	case TPM_VID_ATML:
> +        /* ATMEL chip needs longer timeout to avoid crash */
> +		chip->timeout_wait_stat = TPM_ATML_TIMEOUT_WAIT_STAT;
> +		break;
> +	default:
> +		chip->timeout_wait_stat = TPM_TIMEOUT_WAIT_STAT;
> +	}
> +
>  	rc = tpm_tis_read8(priv, TPM_RID(0), &rid);
>  	if (rc < 0)
>  		goto out_err;
> diff --git a/include/linux/tpm.h b/include/linux/tpm.h
> index aa11fe323c56..35f2a0260d76 100644
> --- a/include/linux/tpm.h
> +++ b/include/linux/tpm.h
> @@ -150,6 +150,7 @@ struct tpm_chip {
>  	bool timeout_adjusted;
>  	unsigned long duration[TPM_NUM_DURATIONS]; /* jiffies */
>  	bool duration_adjusted;
> +	unsigned long timeout_wait_stat; /* usecs */
>  
>  	struct dentry *bios_dir[TPM_NUM_EVENT_LOG_FILES];
>  
> @@ -269,6 +270,7 @@ enum tpm2_cc_attrs {
>  #define TPM_VID_INTEL    0x8086
>  #define TPM_VID_WINBOND  0x1050
>  #define TPM_VID_STM      0x104A
> +#define TPM_VID_ATML     0x1114
>  
>  enum tpm_chip_flags {
>  	TPM_CHIP_FLAG_TPM2		= BIT(1),
> -- 
> 2.29.0.vfs.0.0
> 
> 

/Jarkko

^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH] Fix Atmel TPM crash caused by too frequent queries
  2021-06-20 23:18 [PATCH] Fix Atmel TPM crash caused by too frequent queries Hao Wu
  2021-06-23 13:35 ` Jarkko Sakkinen
@ 2021-06-24  5:33 ` Hao Wu
  2021-06-29 20:07   ` Jarkko Sakkinen
  2021-06-30  4:22   ` [PATCH] tpm: fix ATMEL " Hao Wu
  1 sibling, 2 replies; 91+ messages in thread
From: Hao Wu @ 2021-06-24  5:33 UTC (permalink / raw)
  To: hao.wu, shrihari.kalkar, seungyeop.han, anish.jhaveri,
	peterhuewe, jarkko, jgg, linux-integrity, pmenzel, kgold, zohar,
	why2jjj.linux, hamza, gregkh, arnd, nayna, James.Bottomley

This is a fix for the ATMEL TPM crash bug reported in
https://patchwork.kernel.org/project/linux-integrity/patch/20200926223150.109645-1-hao.wu@rubrik.com/

According to the discussions in the original thread,
we don't want to revert the timeout of wait_for_tpm_stat
for non-ATMEL chips, which brings back the performance cost.
For investigation and analysis of why wait_for_tpm_stat
caused the issue, and how the regression was introduced,
please read the original thread above.

Thus the proposed fix here is to only revert the timeout
for ATMEL chips by checking the vendor ID.
---
 drivers/char/tpm/tpm.h          |  9 ++++++++-
 drivers/char/tpm/tpm_tis_core.c | 19 +++++++++++++++++--
 include/linux/tpm.h             |  2 ++
 3 files changed, 27 insertions(+), 3 deletions(-)

diff --git a/drivers/char/tpm/tpm.h b/drivers/char/tpm/tpm.h
index 283f78211c3a..bc6aa7f9e119 100644
--- a/drivers/char/tpm/tpm.h
+++ b/drivers/char/tpm/tpm.h
@@ -42,7 +42,9 @@ enum tpm_timeout {
 	TPM_TIMEOUT_RANGE_US = 300,	/* usecs */
 	TPM_TIMEOUT_POLL = 1,	/* msecs */
 	TPM_TIMEOUT_USECS_MIN = 100,      /* usecs */
-	TPM_TIMEOUT_USECS_MAX = 500      /* usecs */
+	TPM_TIMEOUT_USECS_MAX = 500,	/* usecs */
+	TPM_TIMEOUT_WAIT_STAT = 500,	/* usecs */
+	TPM_ATML_TIMEOUT_WAIT_STAT = 15000	/* usecs */
 };
 
 /* TPM addresses */
@@ -189,6 +191,11 @@ static inline void tpm_msleep(unsigned int delay_msec)
 		     delay_msec * 1000);
 };
 
+static inline void tpm_usleep(unsigned int delay_usec)
+{
+	usleep_range(delay_usec - TPM_TIMEOUT_RANGE_US, delay_usec);
+};
+
 int tpm_chip_start(struct tpm_chip *chip);
 void tpm_chip_stop(struct tpm_chip *chip);
 struct tpm_chip *tpm_find_get_ops(struct tpm_chip *chip);
diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
index 55b9d3965ae1..9ddd4edfe1c2 100644
--- a/drivers/char/tpm/tpm_tis_core.c
+++ b/drivers/char/tpm/tpm_tis_core.c
@@ -80,8 +80,12 @@ static int wait_for_tpm_stat(struct tpm_chip *chip, u8 mask,
 		}
 	} else {
 		do {
-			usleep_range(TPM_TIMEOUT_USECS_MIN,
-				     TPM_TIMEOUT_USECS_MAX);
+			if (chip->timeout_wait_stat && 
+				chip->timeout_wait_stat >= TPM_TIMEOUT_WAIT_STAT) {
+				tpm_usleep((unsigned int)(chip->timeout_wait_stat));
+			} else {
+				tpm_usleep((unsigned int)(TPM_TIMEOUT_WAIT_STAT));
+			}
 			status = chip->ops->status(chip);
 			if ((status & mask) == mask)
 				return 0;
@@ -934,6 +938,8 @@ int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq,
 	chip->timeout_b = msecs_to_jiffies(TIS_TIMEOUT_B_MAX);
 	chip->timeout_c = msecs_to_jiffies(TIS_TIMEOUT_C_MAX);
 	chip->timeout_d = msecs_to_jiffies(TIS_TIMEOUT_D_MAX);
+	/* init timeout for wait_for_tpm_stat */
+	chip->timeout_wait_stat = TPM_TIMEOUT_WAIT_STAT;
 	priv->phy_ops = phy_ops;
 	dev_set_drvdata(&chip->dev, priv);
 
@@ -983,6 +989,15 @@ int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq,
 
 	priv->manufacturer_id = vendor;
 
+	switch (priv->manufacturer_id) {
+	case TPM_VID_ATML:
+        /* ATMEL chip needs longer timeout to avoid crash */
+		chip->timeout_wait_stat = TPM_ATML_TIMEOUT_WAIT_STAT;
+		break;
+	default:
+		chip->timeout_wait_stat = TPM_TIMEOUT_WAIT_STAT;
+	}
+
 	rc = tpm_tis_read8(priv, TPM_RID(0), &rid);
 	if (rc < 0)
 		goto out_err;
diff --git a/include/linux/tpm.h b/include/linux/tpm.h
index aa11fe323c56..35f2a0260d76 100644
--- a/include/linux/tpm.h
+++ b/include/linux/tpm.h
@@ -150,6 +150,7 @@ struct tpm_chip {
 	bool timeout_adjusted;
 	unsigned long duration[TPM_NUM_DURATIONS]; /* jiffies */
 	bool duration_adjusted;
+	unsigned long timeout_wait_stat; /* usecs */
 
 	struct dentry *bios_dir[TPM_NUM_EVENT_LOG_FILES];
 
@@ -269,6 +270,7 @@ enum tpm2_cc_attrs {
 #define TPM_VID_INTEL    0x8086
 #define TPM_VID_WINBOND  0x1050
 #define TPM_VID_STM      0x104A
+#define TPM_VID_ATML     0x1114
 
 enum tpm_chip_flags {
 	TPM_CHIP_FLAG_TPM2		= BIT(1),
-- 
2.29.0.vfs.0.0


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH] Fix Atmel TPM crash caused by too frequent queries
  2021-06-23 13:35 ` Jarkko Sakkinen
@ 2021-06-24  5:49   ` Hao Wu
  2021-06-29 20:06     ` Jarkko Sakkinen
  0 siblings, 1 reply; 91+ messages in thread
From: Hao Wu @ 2021-06-24  5:49 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: Shrihari Kalkar, Seungyeop Han, Anish Jhaveri, peterhuewe, jgg,
	linux-integrity, Paul Menzel, Ken Goldman, zohar, why2jjj.linux,
	Hamza Attak, gregkh, arnd, Nayna, James.Bottomley

> On Jun 23, 2021, at 6:35 AM, Jarkko Sakkinen <jarkko@kernel.org> wrote:
> 
> On Sun, Jun 20, 2021 at 04:18:09PM -0700, Hao Wu wrote:
>> This is a fix for the ATMEL TPM crash bug reported in
>> https://patchwork.kernel.org/project/linux-integrity/patch/20200926223150.109645-1-hao.wu@rubrik.com/
>> 
>> According to the discussions in the original thread,
>> we don't want to revert the timeout of wait_for_tpm_stat
>> for non-ATMEL chips, which brings back the performance cost.
>> For investigation and analysis of why wait_for_tpm_stat
>> caused the issue, and how the regression was introduced,
>> please read the original thread above.
>> 
>> Thus the proposed fix here is to only revert the timeout
>> for ATMEL chips by checking the vendor ID.
>> 
>> Test Plan:
>> - Run fixed kernel with ATMEL TPM chips and see crash
>>  has been fixed.
>> - Run fixed kernel with non-ATMEL TPM chips, and confirm
>>  the timeout has not been changed.
> 
> Please move test plan right before diffstat if you wan to include such,
> so that it does not go into the commit log.
Hi Jarkko, not sure I understood your suggestion or not. I removed
the test plan from the commit message in a updated commit
https://patchwork.kernel.org/project/linux-integrity/patch/20210624053321.861-1-hao.wu@rubrik.com/

Let me know if I misunderstood this. I am fine to not include test plan,
If this is not something expected by linux community.
I personally think it is helpful to understand the confidence of the commit.

> 
>> ---
>> drivers/char/tpm/tpm.h          |  9 ++++++++-
>> drivers/char/tpm/tpm_tis_core.c | 19 +++++++++++++++++--
>> include/linux/tpm.h             |  2 ++
>> 3 files changed, 27 insertions(+), 3 deletions(-)
>> 
>> diff --git a/drivers/char/tpm/tpm.h b/drivers/char/tpm/tpm.h
>> index 283f78211c3a..bc6aa7f9e119 100644
>> --- a/drivers/char/tpm/tpm.h
>> +++ b/drivers/char/tpm/tpm.h
>> @@ -42,7 +42,9 @@ enum tpm_timeout {
>> 	TPM_TIMEOUT_RANGE_US = 300,	/* usecs */
>> 	TPM_TIMEOUT_POLL = 1,	/* msecs */
>> 	TPM_TIMEOUT_USECS_MIN = 100,      /* usecs */
>> -	TPM_TIMEOUT_USECS_MAX = 500      /* usecs */
>> +	TPM_TIMEOUT_USECS_MAX = 500,	/* usecs */
>> +	TPM_TIMEOUT_WAIT_STAT = 500,	/* usecs */
>> +	TPM_ATML_TIMEOUT_WAIT_STAT = 15000	/* usecs */
>> };
>> 
>> /* TPM addresses */
>> @@ -189,6 +191,11 @@ static inline void tpm_msleep(unsigned int delay_msec)
>> 		     delay_msec * 1000);
>> };
>> 
>> +static inline void tpm_usleep(unsigned int delay_usec)
>> +{
>> +	usleep_range(delay_usec - TPM_TIMEOUT_RANGE_US, delay_usec);
>> +};
>> +
>> int tpm_chip_start(struct tpm_chip *chip);
>> void tpm_chip_stop(struct tpm_chip *chip);
>> struct tpm_chip *tpm_find_get_ops(struct tpm_chip *chip);
>> diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
>> index 55b9d3965ae1..9ddd4edfe1c2 100644
>> --- a/drivers/char/tpm/tpm_tis_core.c
>> +++ b/drivers/char/tpm/tpm_tis_core.c
>> @@ -80,8 +80,12 @@ static int wait_for_tpm_stat(struct tpm_chip *chip, u8 mask,
>> 		}
>> 	} else {
>> 		do {
>> -			usleep_range(TPM_TIMEOUT_USECS_MIN,
>> -				     TPM_TIMEOUT_USECS_MAX);
>> +			if (chip->timeout_wait_stat && 
>> +				chip->timeout_wait_stat >= TPM_TIMEOUT_WAIT_STAT) {
>> +				tpm_usleep((unsigned int)(chip->timeout_wait_stat));
>> +			} else {
>> +				tpm_usleep((unsigned int)(TPM_TIMEOUT_WAIT_STAT));
>> +			}
>> 			status = chip->ops->status(chip);
>> 			if ((status & mask) == mask)
>> 				return 0;
>> @@ -934,6 +938,8 @@ int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq,
>> 	chip->timeout_b = msecs_to_jiffies(TIS_TIMEOUT_B_MAX);
>> 	chip->timeout_c = msecs_to_jiffies(TIS_TIMEOUT_C_MAX);
>> 	chip->timeout_d = msecs_to_jiffies(TIS_TIMEOUT_D_MAX);
>> +	/* init timeout for wait_for_tpm_stat */
>> +	chip->timeout_wait_stat = TPM_TIMEOUT_WAIT_STAT;
>> 	priv->phy_ops = phy_ops;
>> 	dev_set_drvdata(&chip->dev, priv);
>> 
>> @@ -983,6 +989,15 @@ int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq,
>> 
>> 	priv->manufacturer_id = vendor;
>> 
>> +	switch (priv->manufacturer_id) {
>> +	case TPM_VID_ATML:
>> +        /* ATMEL chip needs longer timeout to avoid crash */
>> +		chip->timeout_wait_stat = TPM_ATML_TIMEOUT_WAIT_STAT;
>> +		break;
>> +	default:
>> +		chip->timeout_wait_stat = TPM_TIMEOUT_WAIT_STAT;
>> +	}
>> +
>> 	rc = tpm_tis_read8(priv, TPM_RID(0), &rid);
>> 	if (rc < 0)
>> 		goto out_err;
>> diff --git a/include/linux/tpm.h b/include/linux/tpm.h
>> index aa11fe323c56..35f2a0260d76 100644
>> --- a/include/linux/tpm.h
>> +++ b/include/linux/tpm.h
>> @@ -150,6 +150,7 @@ struct tpm_chip {
>> 	bool timeout_adjusted;
>> 	unsigned long duration[TPM_NUM_DURATIONS]; /* jiffies */
>> 	bool duration_adjusted;
>> +	unsigned long timeout_wait_stat; /* usecs */
>> 
>> 	struct dentry *bios_dir[TPM_NUM_EVENT_LOG_FILES];
>> 
>> @@ -269,6 +270,7 @@ enum tpm2_cc_attrs {
>> #define TPM_VID_INTEL    0x8086
>> #define TPM_VID_WINBOND  0x1050
>> #define TPM_VID_STM      0x104A
>> +#define TPM_VID_ATML     0x1114
>> 
>> enum tpm_chip_flags {
>> 	TPM_CHIP_FLAG_TPM2		= BIT(1),
>> -- 
>> 2.29.0.vfs.0.0
>> 
>> 
> 
> /Jarkko

Hao


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH] Fix Atmel TPM crash caused by too frequent queries
  2021-06-24  5:49   ` Hao Wu
@ 2021-06-29 20:06     ` Jarkko Sakkinen
  2021-06-30  4:27       ` Hao Wu
  0 siblings, 1 reply; 91+ messages in thread
From: Jarkko Sakkinen @ 2021-06-29 20:06 UTC (permalink / raw)
  To: Hao Wu
  Cc: Shrihari Kalkar, Seungyeop Han, Anish Jhaveri, peterhuewe, jgg,
	linux-integrity, Paul Menzel, Ken Goldman, zohar, why2jjj.linux,
	Hamza Attak, gregkh, arnd, Nayna, James.Bottomley

On Wed, Jun 23, 2021 at 10:49:27PM -0700, Hao Wu wrote:
> > On Jun 23, 2021, at 6:35 AM, Jarkko Sakkinen <jarkko@kernel.org> wrote:
> > 
> > On Sun, Jun 20, 2021 at 04:18:09PM -0700, Hao Wu wrote:
> >> This is a fix for the ATMEL TPM crash bug reported in
> >> https://patchwork.kernel.org/project/linux-integrity/patch/20200926223150.109645-1-hao.wu@rubrik.com/
> >> 
> >> According to the discussions in the original thread,
> >> we don't want to revert the timeout of wait_for_tpm_stat
> >> for non-ATMEL chips, which brings back the performance cost.
> >> For investigation and analysis of why wait_for_tpm_stat
> >> caused the issue, and how the regression was introduced,
> >> please read the original thread above.
> >> 
> >> Thus the proposed fix here is to only revert the timeout
> >> for ATMEL chips by checking the vendor ID.
> >> 
> >> Test Plan:
> >> - Run fixed kernel with ATMEL TPM chips and see crash
> >>  has been fixed.
> >> - Run fixed kernel with non-ATMEL TPM chips, and confirm
> >>  the timeout has not been changed.
> > 
> > Please move test plan right before diffstat if you wan to include such,
> > so that it does not go into the commit log.
> Hi Jarkko, not sure I understood your suggestion or not. I removed
> the test plan from the commit message in a updated commit
> https://patchwork.kernel.org/project/linux-integrity/patch/20210624053321.861-1-hao.wu@rubrik.com/
> 
> Let me know if I misunderstood this. I am fine to not include test plan,
> If this is not something expected by linux community.
> I personally think it is helpful to understand the confidence of the commit.
> 
> > 
> >> ---

You can add it right here. Then it won't be included to the actual
commit log but is still available in the patch.

/Jarkko 

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH] Fix Atmel TPM crash caused by too frequent queries
  2021-06-24  5:33 ` Hao Wu
@ 2021-06-29 20:07   ` Jarkko Sakkinen
  2021-06-30  4:22   ` [PATCH] tpm: fix ATMEL " Hao Wu
  1 sibling, 0 replies; 91+ messages in thread
From: Jarkko Sakkinen @ 2021-06-29 20:07 UTC (permalink / raw)
  To: Hao Wu
  Cc: shrihari.kalkar, seungyeop.han, anish.jhaveri, peterhuewe, jgg,
	linux-integrity, pmenzel, kgold, zohar, why2jjj.linux, hamza,
	gregkh, arnd, nayna, James.Bottomley

On Wed, Jun 23, 2021 at 10:33:21PM -0700, Hao Wu wrote:
> This is a fix for the ATMEL TPM crash bug reported in
> https://patchwork.kernel.org/project/linux-integrity/patch/20200926223150.109645-1-hao.wu@rubrik.com/
> 
> According to the discussions in the original thread,
> we don't want to revert the timeout of wait_for_tpm_stat
> for non-ATMEL chips, which brings back the performance cost.
> For investigation and analysis of why wait_for_tpm_stat
> caused the issue, and how the regression was introduced,
> please read the original thread above.
> 
> Thus the proposed fix here is to only revert the timeout
> for ATMEL chips by checking the vendor ID.
> ---

Lacking "Fixes" and "Signed-off-by".

/Jarkko

>  drivers/char/tpm/tpm.h          |  9 ++++++++-
>  drivers/char/tpm/tpm_tis_core.c | 19 +++++++++++++++++--
>  include/linux/tpm.h             |  2 ++
>  3 files changed, 27 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/char/tpm/tpm.h b/drivers/char/tpm/tpm.h
> index 283f78211c3a..bc6aa7f9e119 100644
> --- a/drivers/char/tpm/tpm.h
> +++ b/drivers/char/tpm/tpm.h
> @@ -42,7 +42,9 @@ enum tpm_timeout {
>  	TPM_TIMEOUT_RANGE_US = 300,	/* usecs */
>  	TPM_TIMEOUT_POLL = 1,	/* msecs */
>  	TPM_TIMEOUT_USECS_MIN = 100,      /* usecs */
> -	TPM_TIMEOUT_USECS_MAX = 500      /* usecs */
> +	TPM_TIMEOUT_USECS_MAX = 500,	/* usecs */
> +	TPM_TIMEOUT_WAIT_STAT = 500,	/* usecs */
> +	TPM_ATML_TIMEOUT_WAIT_STAT = 15000	/* usecs */
>  };
>  
>  /* TPM addresses */
> @@ -189,6 +191,11 @@ static inline void tpm_msleep(unsigned int delay_msec)
>  		     delay_msec * 1000);
>  };
>  
> +static inline void tpm_usleep(unsigned int delay_usec)
> +{
> +	usleep_range(delay_usec - TPM_TIMEOUT_RANGE_US, delay_usec);
> +};
> +
>  int tpm_chip_start(struct tpm_chip *chip);
>  void tpm_chip_stop(struct tpm_chip *chip);
>  struct tpm_chip *tpm_find_get_ops(struct tpm_chip *chip);
> diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
> index 55b9d3965ae1..9ddd4edfe1c2 100644
> --- a/drivers/char/tpm/tpm_tis_core.c
> +++ b/drivers/char/tpm/tpm_tis_core.c
> @@ -80,8 +80,12 @@ static int wait_for_tpm_stat(struct tpm_chip *chip, u8 mask,
>  		}
>  	} else {
>  		do {
> -			usleep_range(TPM_TIMEOUT_USECS_MIN,
> -				     TPM_TIMEOUT_USECS_MAX);
> +			if (chip->timeout_wait_stat && 
> +				chip->timeout_wait_stat >= TPM_TIMEOUT_WAIT_STAT) {
> +				tpm_usleep((unsigned int)(chip->timeout_wait_stat));
> +			} else {
> +				tpm_usleep((unsigned int)(TPM_TIMEOUT_WAIT_STAT));
> +			}
>  			status = chip->ops->status(chip);
>  			if ((status & mask) == mask)
>  				return 0;
> @@ -934,6 +938,8 @@ int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq,
>  	chip->timeout_b = msecs_to_jiffies(TIS_TIMEOUT_B_MAX);
>  	chip->timeout_c = msecs_to_jiffies(TIS_TIMEOUT_C_MAX);
>  	chip->timeout_d = msecs_to_jiffies(TIS_TIMEOUT_D_MAX);
> +	/* init timeout for wait_for_tpm_stat */
> +	chip->timeout_wait_stat = TPM_TIMEOUT_WAIT_STAT;
>  	priv->phy_ops = phy_ops;
>  	dev_set_drvdata(&chip->dev, priv);
>  
> @@ -983,6 +989,15 @@ int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq,
>  
>  	priv->manufacturer_id = vendor;
>  
> +	switch (priv->manufacturer_id) {
> +	case TPM_VID_ATML:
> +        /* ATMEL chip needs longer timeout to avoid crash */
> +		chip->timeout_wait_stat = TPM_ATML_TIMEOUT_WAIT_STAT;
> +		break;
> +	default:
> +		chip->timeout_wait_stat = TPM_TIMEOUT_WAIT_STAT;
> +	}
> +
>  	rc = tpm_tis_read8(priv, TPM_RID(0), &rid);
>  	if (rc < 0)
>  		goto out_err;
> diff --git a/include/linux/tpm.h b/include/linux/tpm.h
> index aa11fe323c56..35f2a0260d76 100644
> --- a/include/linux/tpm.h
> +++ b/include/linux/tpm.h
> @@ -150,6 +150,7 @@ struct tpm_chip {
>  	bool timeout_adjusted;
>  	unsigned long duration[TPM_NUM_DURATIONS]; /* jiffies */
>  	bool duration_adjusted;
> +	unsigned long timeout_wait_stat; /* usecs */
>  
>  	struct dentry *bios_dir[TPM_NUM_EVENT_LOG_FILES];
>  
> @@ -269,6 +270,7 @@ enum tpm2_cc_attrs {
>  #define TPM_VID_INTEL    0x8086
>  #define TPM_VID_WINBOND  0x1050
>  #define TPM_VID_STM      0x104A
> +#define TPM_VID_ATML     0x1114
>  
>  enum tpm_chip_flags {
>  	TPM_CHIP_FLAG_TPM2		= BIT(1),
> -- 
> 2.29.0.vfs.0.0
> 
> 

^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH] tpm: fix ATMEL TPM crash caused by too frequent queries
  2021-06-24  5:33 ` Hao Wu
  2021-06-29 20:07   ` Jarkko Sakkinen
@ 2021-06-30  4:22   ` Hao Wu
  2021-07-02  6:35     ` Jarkko Sakkinen
                       ` (3 more replies)
  1 sibling, 4 replies; 91+ messages in thread
From: Hao Wu @ 2021-06-30  4:22 UTC (permalink / raw)
  To: hao.wu, shrihari.kalkar, seungyeop.han, anish.jhaveri,
	peterhuewe, jarkko, jgg, linux-integrity, pmenzel, kgold, zohar,
	why2jjj.linux, hamza, gregkh, arnd, nayna, James.Bottomley

This is a fix for the ATMEL TPM crash bug reported in
https://patchwork.kernel.org/project/linux-integrity/patch/20200926223150.109645-1-hao.wu@rubrik.com/

According to the discussions in the original thread,
we don't want to revert the timeout of wait_for_tpm_stat
for non-ATMEL chips, which brings back the performance cost.
For investigation and analysis of why wait_for_tpm_stat
caused the issue, and how the regression was introduced,
please read the original thread above.

Thus the proposed fix here is to only revert the timeout
for ATMEL chips by checking the vendor ID.

Signed-off-by: Hao Wu <hao.wu@rubrik.com>
Fixes: 9f3fc7bcddcb ("tpm: replace msleep() with usleep_range() in TPM 1.2/2.0 generic drivers")
---
Test Plan:
- Run fixed kernel with ATMEL TPM chips and see crash
has been fixed.
- Run fixed kernel with non-ATMEL TPM chips, and confirm
the timeout has not been changed.

 drivers/char/tpm/tpm.h          |  9 ++++++++-
 drivers/char/tpm/tpm_tis_core.c | 19 +++++++++++++++++--
 include/linux/tpm.h             |  2 ++
 3 files changed, 27 insertions(+), 3 deletions(-)

diff --git a/drivers/char/tpm/tpm.h b/drivers/char/tpm/tpm.h
index 283f78211c3a..bc6aa7f9e119 100644
--- a/drivers/char/tpm/tpm.h
+++ b/drivers/char/tpm/tpm.h
@@ -42,7 +42,9 @@ enum tpm_timeout {
 	TPM_TIMEOUT_RANGE_US = 300,	/* usecs */
 	TPM_TIMEOUT_POLL = 1,	/* msecs */
 	TPM_TIMEOUT_USECS_MIN = 100,      /* usecs */
-	TPM_TIMEOUT_USECS_MAX = 500      /* usecs */
+	TPM_TIMEOUT_USECS_MAX = 500,	/* usecs */
+	TPM_TIMEOUT_WAIT_STAT = 500,	/* usecs */
+	TPM_ATML_TIMEOUT_WAIT_STAT = 15000	/* usecs */
 };
 
 /* TPM addresses */
@@ -189,6 +191,11 @@ static inline void tpm_msleep(unsigned int delay_msec)
 		     delay_msec * 1000);
 };
 
+static inline void tpm_usleep(unsigned int delay_usec)
+{
+	usleep_range(delay_usec - TPM_TIMEOUT_RANGE_US, delay_usec);
+};
+
 int tpm_chip_start(struct tpm_chip *chip);
 void tpm_chip_stop(struct tpm_chip *chip);
 struct tpm_chip *tpm_find_get_ops(struct tpm_chip *chip);
diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
index 55b9d3965ae1..9ddd4edfe1c2 100644
--- a/drivers/char/tpm/tpm_tis_core.c
+++ b/drivers/char/tpm/tpm_tis_core.c
@@ -80,8 +80,12 @@ static int wait_for_tpm_stat(struct tpm_chip *chip, u8 mask,
 		}
 	} else {
 		do {
-			usleep_range(TPM_TIMEOUT_USECS_MIN,
-				     TPM_TIMEOUT_USECS_MAX);
+			if (chip->timeout_wait_stat && 
+				chip->timeout_wait_stat >= TPM_TIMEOUT_WAIT_STAT) {
+				tpm_usleep((unsigned int)(chip->timeout_wait_stat));
+			} else {
+				tpm_usleep((unsigned int)(TPM_TIMEOUT_WAIT_STAT));
+			}
 			status = chip->ops->status(chip);
 			if ((status & mask) == mask)
 				return 0;
@@ -934,6 +938,8 @@ int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq,
 	chip->timeout_b = msecs_to_jiffies(TIS_TIMEOUT_B_MAX);
 	chip->timeout_c = msecs_to_jiffies(TIS_TIMEOUT_C_MAX);
 	chip->timeout_d = msecs_to_jiffies(TIS_TIMEOUT_D_MAX);
+	/* init timeout for wait_for_tpm_stat */
+	chip->timeout_wait_stat = TPM_TIMEOUT_WAIT_STAT;
 	priv->phy_ops = phy_ops;
 	dev_set_drvdata(&chip->dev, priv);
 
@@ -983,6 +989,15 @@ int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq,
 
 	priv->manufacturer_id = vendor;
 
+	switch (priv->manufacturer_id) {
+	case TPM_VID_ATML:
+        /* ATMEL chip needs longer timeout to avoid crash */
+		chip->timeout_wait_stat = TPM_ATML_TIMEOUT_WAIT_STAT;
+		break;
+	default:
+		chip->timeout_wait_stat = TPM_TIMEOUT_WAIT_STAT;
+	}
+
 	rc = tpm_tis_read8(priv, TPM_RID(0), &rid);
 	if (rc < 0)
 		goto out_err;
diff --git a/include/linux/tpm.h b/include/linux/tpm.h
index aa11fe323c56..35f2a0260d76 100644
--- a/include/linux/tpm.h
+++ b/include/linux/tpm.h
@@ -150,6 +150,7 @@ struct tpm_chip {
 	bool timeout_adjusted;
 	unsigned long duration[TPM_NUM_DURATIONS]; /* jiffies */
 	bool duration_adjusted;
+	unsigned long timeout_wait_stat; /* usecs */
 
 	struct dentry *bios_dir[TPM_NUM_EVENT_LOG_FILES];
 
@@ -269,6 +270,7 @@ enum tpm2_cc_attrs {
 #define TPM_VID_INTEL    0x8086
 #define TPM_VID_WINBOND  0x1050
 #define TPM_VID_STM      0x104A
+#define TPM_VID_ATML     0x1114
 
 enum tpm_chip_flags {
 	TPM_CHIP_FLAG_TPM2		= BIT(1),
-- 
2.29.0.vfs.0.0


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH] Fix Atmel TPM crash caused by too frequent queries
  2021-06-29 20:06     ` Jarkko Sakkinen
@ 2021-06-30  4:27       ` Hao Wu
  0 siblings, 0 replies; 91+ messages in thread
From: Hao Wu @ 2021-06-30  4:27 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: Shrihari Kalkar, Seungyeop Han, Anish Jhaveri, peterhuewe, jgg,
	linux-integrity, Paul Menzel, Ken Goldman, zohar, why2jjj.linux,
	Hamza Attak, gregkh, arnd, Nayna, James.Bottomley

> On Jun 29, 2021, at 1:06 PM, Jarkko Sakkinen <jarkko@kernel.org> wrote:
> 
> On Wed, Jun 23, 2021 at 10:49:27PM -0700, Hao Wu wrote:
>>> On Jun 23, 2021, at 6:35 AM, Jarkko Sakkinen <jarkko@kernel.org> wrote:
>>> 
>>> On Sun, Jun 20, 2021 at 04:18:09PM -0700, Hao Wu wrote:
>>>> This is a fix for the ATMEL TPM crash bug reported in
>>>> https://patchwork.kernel.org/project/linux-integrity/patch/20200926223150.109645-1-hao.wu@rubrik.com/
>>>> 
>>>> According to the discussions in the original thread,
>>>> we don't want to revert the timeout of wait_for_tpm_stat
>>>> for non-ATMEL chips, which brings back the performance cost.
>>>> For investigation and analysis of why wait_for_tpm_stat
>>>> caused the issue, and how the regression was introduced,
>>>> please read the original thread above.
>>>> 
>>>> Thus the proposed fix here is to only revert the timeout
>>>> for ATMEL chips by checking the vendor ID.
>>>> 
>>>> Test Plan:
>>>> - Run fixed kernel with ATMEL TPM chips and see crash
>>>> has been fixed.
>>>> - Run fixed kernel with non-ATMEL TPM chips, and confirm
>>>> the timeout has not been changed.
>>> 
>>> Please move test plan right before diffstat if you wan to include such,
>>> so that it does not go into the commit log.
>> Hi Jarkko, not sure I understood your suggestion or not. I removed
>> the test plan from the commit message in a updated commit
>> https://patchwork.kernel.org/project/linux-integrity/patch/20210624053321.861-1-hao.wu@rubrik.com/
>> 
>> Let me know if I misunderstood this. I am fine to not include test plan,
>> If this is not something expected by linux community.
>> I personally think it is helpful to understand the confidence of the commit.
>> 
>>> 
>>>> ---
> 
> You can add it right here. Then it won't be included to the actual
> commit log but is still available in the patch.
> 
I see, thanks Jarkko. Updated the patch
https://patchwork.kernel.org/project/linux-integrity/patch/20210630042205.30051-1-hao.wu@rubrik.com/
Hopefull it makes more sense now.

> /Jarkko 

Hao

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH] tpm: fix ATMEL TPM crash caused by too frequent queries
  2021-06-30  4:22   ` [PATCH] tpm: fix ATMEL " Hao Wu
@ 2021-07-02  6:35     ` Jarkko Sakkinen
  2021-07-02  7:12       ` Greg KH
  2021-07-02  7:33       ` Hao Wu
  2021-07-04  0:07     ` Hao Wu
                       ` (2 subsequent siblings)
  3 siblings, 2 replies; 91+ messages in thread
From: Jarkko Sakkinen @ 2021-07-02  6:35 UTC (permalink / raw)
  To: Hao Wu
  Cc: shrihari.kalkar, seungyeop.han, anish.jhaveri, peterhuewe, jgg,
	linux-integrity, pmenzel, kgold, zohar, why2jjj.linux, hamza,
	gregkh, arnd, nayna, James.Bottomley

On Tue, Jun 29, 2021 at 09:22:05PM -0700, Hao Wu wrote:
> This is a fix for the ATMEL TPM crash bug reported in
> https://patchwork.kernel.org/project/linux-integrity/patch/20200926223150.109645-1-hao.wu@rubrik.com/
> 
> According to the discussions in the original thread,
> we don't want to revert the timeout of wait_for_tpm_stat
> for non-ATMEL chips, which brings back the performance cost.
> For investigation and analysis of why wait_for_tpm_stat
> caused the issue, and how the regression was introduced,
> please read the original thread above.
> 
> Thus the proposed fix here is to only revert the timeout
> for ATMEL chips by checking the vendor ID.
> 
> Signed-off-by: Hao Wu <hao.wu@rubrik.com>
> Fixes: 9f3fc7bcddcb ("tpm: replace msleep() with usleep_range() in TPM 1.2/2.0 generic drivers")

Fixes tag should be before SOB.

> ---
> Test Plan:
> - Run fixed kernel with ATMEL TPM chips and see crash
> has been fixed.
> - Run fixed kernel with non-ATMEL TPM chips, and confirm
> the timeout has not been changed.
> 
>  drivers/char/tpm/tpm.h          |  9 ++++++++-
>  drivers/char/tpm/tpm_tis_core.c | 19 +++++++++++++++++--
>  include/linux/tpm.h             |  2 ++
>  3 files changed, 27 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/char/tpm/tpm.h b/drivers/char/tpm/tpm.h
> index 283f78211c3a..bc6aa7f9e119 100644
> --- a/drivers/char/tpm/tpm.h
> +++ b/drivers/char/tpm/tpm.h
> @@ -42,7 +42,9 @@ enum tpm_timeout {
>  	TPM_TIMEOUT_RANGE_US = 300,	/* usecs */
>  	TPM_TIMEOUT_POLL = 1,	/* msecs */
>  	TPM_TIMEOUT_USECS_MIN = 100,      /* usecs */
> -	TPM_TIMEOUT_USECS_MAX = 500      /* usecs */
> +	TPM_TIMEOUT_USECS_MAX = 500,	/* usecs */

What is this change?

> +	TPM_TIMEOUT_WAIT_STAT = 500,	/* usecs */
> +	TPM_ATML_TIMEOUT_WAIT_STAT = 15000	/* usecs */
>  };
>  
>  /* TPM addresses */
> @@ -189,6 +191,11 @@ static inline void tpm_msleep(unsigned int delay_msec)
>  		     delay_msec * 1000);
>  };
>  
> +static inline void tpm_usleep(unsigned int delay_usec)
> +{
> +	usleep_range(delay_usec - TPM_TIMEOUT_RANGE_US, delay_usec);
> +};

Please remove this, and open code.

> +
>  int tpm_chip_start(struct tpm_chip *chip);
>  void tpm_chip_stop(struct tpm_chip *chip);
>  struct tpm_chip *tpm_find_get_ops(struct tpm_chip *chip);
> diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
> index 55b9d3965ae1..9ddd4edfe1c2 100644
> --- a/drivers/char/tpm/tpm_tis_core.c
> +++ b/drivers/char/tpm/tpm_tis_core.c
> @@ -80,8 +80,12 @@ static int wait_for_tpm_stat(struct tpm_chip *chip, u8 mask,
>  		}
>  	} else {
>  		do {
> -			usleep_range(TPM_TIMEOUT_USECS_MIN,
> -				     TPM_TIMEOUT_USECS_MAX);
> +			if (chip->timeout_wait_stat && 
> +				chip->timeout_wait_stat >= TPM_TIMEOUT_WAIT_STAT) {
> +				tpm_usleep((unsigned int)(chip->timeout_wait_stat));
> +			} else {
> +				tpm_usleep((unsigned int)(TPM_TIMEOUT_WAIT_STAT));
> +			}

Invalid use of braces. Please read

https://www.kernel.org/doc/html/v5.13/process/coding-style.html

Why do you have to use this field conditionally anyway? Why doesn't
it always contain a legit value?

>  			status = chip->ops->status(chip);
>  			if ((status & mask) == mask)
>  				return 0;
> @@ -934,6 +938,8 @@ int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq,
>  	chip->timeout_b = msecs_to_jiffies(TIS_TIMEOUT_B_MAX);
>  	chip->timeout_c = msecs_to_jiffies(TIS_TIMEOUT_C_MAX);
>  	chip->timeout_d = msecs_to_jiffies(TIS_TIMEOUT_D_MAX);
> +	/* init timeout for wait_for_tpm_stat */
> +	chip->timeout_wait_stat = TPM_TIMEOUT_WAIT_STAT;
>  	priv->phy_ops = phy_ops;
>  	dev_set_drvdata(&chip->dev, priv);
>  
> @@ -983,6 +989,15 @@ int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq,
>  
>  	priv->manufacturer_id = vendor;
>  
> +	switch (priv->manufacturer_id) {
> +	case TPM_VID_ATML:
> +        /* ATMEL chip needs longer timeout to avoid crash */
> +		chip->timeout_wait_stat = TPM_ATML_TIMEOUT_WAIT_STAT;
> +		break;
> +	default:
> +		chip->timeout_wait_stat = TPM_TIMEOUT_WAIT_STAT;
> +	}
> +
>  	rc = tpm_tis_read8(priv, TPM_RID(0), &rid);
>  	if (rc < 0)
>  		goto out_err;
> diff --git a/include/linux/tpm.h b/include/linux/tpm.h
> index aa11fe323c56..35f2a0260d76 100644
> --- a/include/linux/tpm.h
> +++ b/include/linux/tpm.h
> @@ -150,6 +150,7 @@ struct tpm_chip {
>  	bool timeout_adjusted;
>  	unsigned long duration[TPM_NUM_DURATIONS]; /* jiffies */
>  	bool duration_adjusted;
> +	unsigned long timeout_wait_stat; /* usecs */
>  
>  	struct dentry *bios_dir[TPM_NUM_EVENT_LOG_FILES];
>  
> @@ -269,6 +270,7 @@ enum tpm2_cc_attrs {
>  #define TPM_VID_INTEL    0x8086
>  #define TPM_VID_WINBOND  0x1050
>  #define TPM_VID_STM      0x104A
> +#define TPM_VID_ATML     0x1114
>  
>  enum tpm_chip_flags {
>  	TPM_CHIP_FLAG_TPM2		= BIT(1),
> -- 
> 2.29.0.vfs.0.0
> 
> 

/Jarkko

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH] tpm: fix ATMEL TPM crash caused by too frequent queries
  2021-07-02  6:35     ` Jarkko Sakkinen
@ 2021-07-02  7:12       ` Greg KH
  2021-07-02  7:33       ` Hao Wu
  1 sibling, 0 replies; 91+ messages in thread
From: Greg KH @ 2021-07-02  7:12 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: Hao Wu, shrihari.kalkar, seungyeop.han, anish.jhaveri,
	peterhuewe, jgg, linux-integrity, pmenzel, kgold, zohar,
	why2jjj.linux, hamza, arnd, nayna, James.Bottomley

On Fri, Jul 02, 2021 at 09:35:55AM +0300, Jarkko Sakkinen wrote:
> On Tue, Jun 29, 2021 at 09:22:05PM -0700, Hao Wu wrote:
> > This is a fix for the ATMEL TPM crash bug reported in
> > https://patchwork.kernel.org/project/linux-integrity/patch/20200926223150.109645-1-hao.wu@rubrik.com/
> > 
> > According to the discussions in the original thread,
> > we don't want to revert the timeout of wait_for_tpm_stat
> > for non-ATMEL chips, which brings back the performance cost.
> > For investigation and analysis of why wait_for_tpm_stat
> > caused the issue, and how the regression was introduced,
> > please read the original thread above.
> > 
> > Thus the proposed fix here is to only revert the timeout
> > for ATMEL chips by checking the vendor ID.
> > 
> > Signed-off-by: Hao Wu <hao.wu@rubrik.com>
> > Fixes: 9f3fc7bcddcb ("tpm: replace msleep() with usleep_range() in TPM 1.2/2.0 generic drivers")
> 
> Fixes tag should be before SOB.

Does not matter :)

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH] tpm: fix ATMEL TPM crash caused by too frequent queries
  2021-07-02  6:35     ` Jarkko Sakkinen
  2021-07-02  7:12       ` Greg KH
@ 2021-07-02  7:33       ` Hao Wu
  2021-07-02  7:35         ` Hao Wu
  2021-07-02  7:45         ` Jarkko Sakkinen
  1 sibling, 2 replies; 91+ messages in thread
From: Hao Wu @ 2021-07-02  7:33 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: Shrihari Kalkar, Seungyeop Han, Anish Jhaveri, peterhuewe, jgg,
	linux-integrity, Paul Menzel, Ken Goldman, zohar, why2jjj.linux,
	Hamza Attak, gregkh, arnd, Nayna, James Bottomley



> On Jul 1, 2021, at 11:35 PM, Jarkko Sakkinen <jarkko@kernel.org> wrote:
> 
> On Tue, Jun 29, 2021 at 09:22:05PM -0700, Hao Wu wrote:
>> This is a fix for the ATMEL TPM crash bug reported in
>> https://patchwork.kernel.org/project/linux-integrity/patch/20200926223150.109645-1-hao.wu@rubrik.com/
>> 
>> According to the discussions in the original thread,
>> we don't want to revert the timeout of wait_for_tpm_stat
>> for non-ATMEL chips, which brings back the performance cost.
>> For investigation and analysis of why wait_for_tpm_stat
>> caused the issue, and how the regression was introduced,
>> please read the original thread above.
>> 
>> Thus the proposed fix here is to only revert the timeout
>> for ATMEL chips by checking the vendor ID.
>> 
>> Signed-off-by: Hao Wu <hao.wu@rubrik.com>
>> Fixes: 9f3fc7bcddcb ("tpm: replace msleep() with usleep_range() in TPM 1.2/2.0 generic drivers")
> 
> Fixes tag should be before SOB.
> 
>> ---
>> Test Plan:
>> - Run fixed kernel with ATMEL TPM chips and see crash
>> has been fixed.
>> - Run fixed kernel with non-ATMEL TPM chips, and confirm
>> the timeout has not been changed.
>> 
>> drivers/char/tpm/tpm.h          |  9 ++++++++-
>> drivers/char/tpm/tpm_tis_core.c | 19 +++++++++++++++++--
>> include/linux/tpm.h             |  2 ++
>> 3 files changed, 27 insertions(+), 3 deletions(-)
>> 
>> diff --git a/drivers/char/tpm/tpm.h b/drivers/char/tpm/tpm.h
>> index 283f78211c3a..bc6aa7f9e119 100644
>> --- a/drivers/char/tpm/tpm.h
>> +++ b/drivers/char/tpm/tpm.h
>> @@ -42,7 +42,9 @@ enum tpm_timeout {
>> 	TPM_TIMEOUT_RANGE_US = 300,	/* usecs */
>> 	TPM_TIMEOUT_POLL = 1,	/* msecs */
>> 	TPM_TIMEOUT_USECS_MIN = 100,      /* usecs */
>> -	TPM_TIMEOUT_USECS_MAX = 500      /* usecs */
>> +	TPM_TIMEOUT_USECS_MAX = 500,	/* usecs */
> 
> What is this change?
Need to add the tailing comma

> 
>> +	TPM_TIMEOUT_WAIT_STAT = 500,	/* usecs */
>> +	TPM_ATML_TIMEOUT_WAIT_STAT = 15000	/* usecs */
>> };
>> 
>> /* TPM addresses */
>> @@ -189,6 +191,11 @@ static inline void tpm_msleep(unsigned int delay_msec)
>> 		     delay_msec * 1000);
>> };
>> 
>> +static inline void tpm_usleep(unsigned int delay_usec)
>> +{
>> +	usleep_range(delay_usec - TPM_TIMEOUT_RANGE_US, delay_usec);
>> +};
> 
> Please remove this, and open code.
Ok, will do

>> +
>> int tpm_chip_start(struct tpm_chip *chip);
>> void tpm_chip_stop(struct tpm_chip *chip);
>> struct tpm_chip *tpm_find_get_ops(struct tpm_chip *chip);
>> diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
>> index 55b9d3965ae1..9ddd4edfe1c2 100644
>> --- a/drivers/char/tpm/tpm_tis_core.c
>> +++ b/drivers/char/tpm/tpm_tis_core.c
>> @@ -80,8 +80,12 @@ static int wait_for_tpm_stat(struct tpm_chip *chip, u8 mask,
>> 		}
>> 	} else {
>> 		do {
>> -			usleep_range(TPM_TIMEOUT_USECS_MIN,
>> -				     TPM_TIMEOUT_USECS_MAX);
>> +			if (chip->timeout_wait_stat && 
>> +				chip->timeout_wait_stat >= TPM_TIMEOUT_WAIT_STAT) {
>> +				tpm_usleep((unsigned int)(chip->timeout_wait_stat));
>> +			} else {
>> +				tpm_usleep((unsigned int)(TPM_TIMEOUT_WAIT_STAT));
>> +			}
> 
> Invalid use of braces. Please read
> 
> https://www.kernel.org/doc/html/v5.13/process/coding-style.html
> 
> Why do you have to use this field conditionally anyway? Why doesn't
> it always contain a legit value?
The field is legit now, but doesn’t hurt to do addition check for robustness 
to ensure no crash ? Just in case the value is updated below TPM_TIMEOUT_WAIT_STAT ? 

Can remove if we think it is not needed.

> 
>> 			status = chip->ops->status(chip);
>> 			if ((status & mask) == mask)
>> 				return 0;
>> @@ -934,6 +938,8 @@ int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq,
>> 	chip->timeout_b = msecs_to_jiffies(TIS_TIMEOUT_B_MAX);
>> 	chip->timeout_c = msecs_to_jiffies(TIS_TIMEOUT_C_MAX);
>> 	chip->timeout_d = msecs_to_jiffies(TIS_TIMEOUT_D_MAX);
>> +	/* init timeout for wait_for_tpm_stat */
>> +	chip->timeout_wait_stat = TPM_TIMEOUT_WAIT_STAT;
>> 	priv->phy_ops = phy_ops;
>> 	dev_set_drvdata(&chip->dev, priv);
>> 
>> @@ -983,6 +989,15 @@ int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq,
>> 
>> 	priv->manufacturer_id = vendor;
>> 
>> +	switch (priv->manufacturer_id) {
>> +	case TPM_VID_ATML:
>> +        /* ATMEL chip needs longer timeout to avoid crash */
Will fix the indentation

Also according to Kenneth we only want to do so for TPM 1.2, 
I will try checking chip->flags against TPM_CHIP_FLAG_TPM2 here
Let me know if there are concerns.
 
>> +		chip->timeout_wait_stat = TPM_ATML_TIMEOUT_WAIT_STAT;
>> +		break;
>> +	default:
>> +		chip->timeout_wait_stat = TPM_TIMEOUT_WAIT_STAT;
>> +	}
>> +
>> 	rc = tpm_tis_read8(priv, TPM_RID(0), &rid);
>> 	if (rc < 0)
>> 		goto out_err;
>> diff --git a/include/linux/tpm.h b/include/linux/tpm.h
>> index aa11fe323c56..35f2a0260d76 100644
>> --- a/include/linux/tpm.h
>> +++ b/include/linux/tpm.h
>> @@ -150,6 +150,7 @@ struct tpm_chip {
>> 	bool timeout_adjusted;
>> 	unsigned long duration[TPM_NUM_DURATIONS]; /* jiffies */
>> 	bool duration_adjusted;
>> +	unsigned long timeout_wait_stat; /* usecs */
>> 
>> 	struct dentry *bios_dir[TPM_NUM_EVENT_LOG_FILES];
>> 
>> @@ -269,6 +270,7 @@ enum tpm2_cc_attrs {
>> #define TPM_VID_INTEL    0x8086
>> #define TPM_VID_WINBOND  0x1050
>> #define TPM_VID_STM      0x104A
>> +#define TPM_VID_ATML     0x1114
>> 
>> enum tpm_chip_flags {
>> 	TPM_CHIP_FLAG_TPM2		= BIT(1),
>> -- 
>> 2.29.0.vfs.0.0
>> 
>> 
> 
> /Jarkko


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH] tpm: fix ATMEL TPM crash caused by too frequent queries
  2021-07-02  7:33       ` Hao Wu
@ 2021-07-02  7:35         ` Hao Wu
  2021-07-02  7:45         ` Jarkko Sakkinen
  1 sibling, 0 replies; 91+ messages in thread
From: Hao Wu @ 2021-07-02  7:35 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: Shrihari Kalkar, Seungyeop Han, Anish Jhaveri, peterhuewe, jgg,
	linux-integrity, Paul Menzel, Ken Goldman, zohar, why2jjj.linux,
	Hamza Attak, gregkh, arnd, Nayna, James Bottomley



> On Jul 2, 2021, at 12:33 AM, Hao Wu <hao.wu@rubrik.com> wrote:
> 
> 
> 
>> On Jul 1, 2021, at 11:35 PM, Jarkko Sakkinen <jarkko@kernel.org> wrote:
>> 
>> On Tue, Jun 29, 2021 at 09:22:05PM -0700, Hao Wu wrote:
>>> This is a fix for the ATMEL TPM crash bug reported in
>>> https://patchwork.kernel.org/project/linux-integrity/patch/20200926223150.109645-1-hao.wu@rubrik.com/
>>> 
>>> According to the discussions in the original thread,
>>> we don't want to revert the timeout of wait_for_tpm_stat
>>> for non-ATMEL chips, which brings back the performance cost.
>>> For investigation and analysis of why wait_for_tpm_stat
>>> caused the issue, and how the regression was introduced,
>>> please read the original thread above.
>>> 
>>> Thus the proposed fix here is to only revert the timeout
>>> for ATMEL chips by checking the vendor ID.
>>> 
>>> Signed-off-by: Hao Wu <hao.wu@rubrik.com>
>>> Fixes: 9f3fc7bcddcb ("tpm: replace msleep() with usleep_range() in TPM 1.2/2.0 generic drivers")
>> 
>> Fixes tag should be before SOB.
>> 
>>> ---
>>> Test Plan:
>>> - Run fixed kernel with ATMEL TPM chips and see crash
>>> has been fixed.
>>> - Run fixed kernel with non-ATMEL TPM chips, and confirm
>>> the timeout has not been changed.
>>> 
>>> drivers/char/tpm/tpm.h          |  9 ++++++++-
>>> drivers/char/tpm/tpm_tis_core.c | 19 +++++++++++++++++--
>>> include/linux/tpm.h             |  2 ++
>>> 3 files changed, 27 insertions(+), 3 deletions(-)
>>> 
>>> diff --git a/drivers/char/tpm/tpm.h b/drivers/char/tpm/tpm.h
>>> index 283f78211c3a..bc6aa7f9e119 100644
>>> --- a/drivers/char/tpm/tpm.h
>>> +++ b/drivers/char/tpm/tpm.h
>>> @@ -42,7 +42,9 @@ enum tpm_timeout {
>>> 	TPM_TIMEOUT_RANGE_US = 300,	/* usecs */
>>> 	TPM_TIMEOUT_POLL = 1,	/* msecs */
>>> 	TPM_TIMEOUT_USECS_MIN = 100,      /* usecs */
>>> -	TPM_TIMEOUT_USECS_MAX = 500      /* usecs */
>>> +	TPM_TIMEOUT_USECS_MAX = 500,	/* usecs */
>> 
>> What is this change?
> Need to add the tailing comma
Ah, sorry, didn’t notice added the duplicated line by mistake. Will remove it. 
> 
>> 
>>> +	TPM_TIMEOUT_WAIT_STAT = 500,	/* usecs */
>>> +	TPM_ATML_TIMEOUT_WAIT_STAT = 15000	/* usecs */
>>> };
>>> 
>>> /* TPM addresses */
>>> @@ -189,6 +191,11 @@ static inline void tpm_msleep(unsigned int delay_msec)
>>> 		     delay_msec * 1000);
>>> };
>>> 
>>> +static inline void tpm_usleep(unsigned int delay_usec)
>>> +{
>>> +	usleep_range(delay_usec - TPM_TIMEOUT_RANGE_US, delay_usec);
>>> +};
>> 
>> Please remove this, and open code.
> Ok, will do
> 
>>> +
>>> int tpm_chip_start(struct tpm_chip *chip);
>>> void tpm_chip_stop(struct tpm_chip *chip);
>>> struct tpm_chip *tpm_find_get_ops(struct tpm_chip *chip);
>>> diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
>>> index 55b9d3965ae1..9ddd4edfe1c2 100644
>>> --- a/drivers/char/tpm/tpm_tis_core.c
>>> +++ b/drivers/char/tpm/tpm_tis_core.c
>>> @@ -80,8 +80,12 @@ static int wait_for_tpm_stat(struct tpm_chip *chip, u8 mask,
>>> 		}
>>> 	} else {
>>> 		do {
>>> -			usleep_range(TPM_TIMEOUT_USECS_MIN,
>>> -				     TPM_TIMEOUT_USECS_MAX);
>>> +			if (chip->timeout_wait_stat && 
>>> +				chip->timeout_wait_stat >= TPM_TIMEOUT_WAIT_STAT) {
>>> +				tpm_usleep((unsigned int)(chip->timeout_wait_stat));
>>> +			} else {
>>> +				tpm_usleep((unsigned int)(TPM_TIMEOUT_WAIT_STAT));
>>> +			}
>> 
>> Invalid use of braces. Please read
>> 
>> https://www.kernel.org/doc/html/v5.13/process/coding-style.html
>> 
>> Why do you have to use this field conditionally anyway? Why doesn't
>> it always contain a legit value?
> The field is legit now, but doesn’t hurt to do addition check for robustness 
> to ensure no crash ? Just in case the value is updated below TPM_TIMEOUT_WAIT_STAT ? 
> 
> Can remove if we think it is not needed.
> 
>> 
>>> 			status = chip->ops->status(chip);
>>> 			if ((status & mask) == mask)
>>> 				return 0;
>>> @@ -934,6 +938,8 @@ int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq,
>>> 	chip->timeout_b = msecs_to_jiffies(TIS_TIMEOUT_B_MAX);
>>> 	chip->timeout_c = msecs_to_jiffies(TIS_TIMEOUT_C_MAX);
>>> 	chip->timeout_d = msecs_to_jiffies(TIS_TIMEOUT_D_MAX);
>>> +	/* init timeout for wait_for_tpm_stat */
>>> +	chip->timeout_wait_stat = TPM_TIMEOUT_WAIT_STAT;
>>> 	priv->phy_ops = phy_ops;
>>> 	dev_set_drvdata(&chip->dev, priv);
>>> 
>>> @@ -983,6 +989,15 @@ int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq,
>>> 
>>> 	priv->manufacturer_id = vendor;
>>> 
>>> +	switch (priv->manufacturer_id) {
>>> +	case TPM_VID_ATML:
>>> +        /* ATMEL chip needs longer timeout to avoid crash */
> Will fix the indentation
> 
> Also according to Kenneth we only want to do so for TPM 1.2, 
> I will try checking chip->flags against TPM_CHIP_FLAG_TPM2 here
> Let me know if there are concerns.
> 
>>> +		chip->timeout_wait_stat = TPM_ATML_TIMEOUT_WAIT_STAT;
>>> +		break;
>>> +	default:
>>> +		chip->timeout_wait_stat = TPM_TIMEOUT_WAIT_STAT;
>>> +	}
>>> +
>>> 	rc = tpm_tis_read8(priv, TPM_RID(0), &rid);
>>> 	if (rc < 0)
>>> 		goto out_err;
>>> diff --git a/include/linux/tpm.h b/include/linux/tpm.h
>>> index aa11fe323c56..35f2a0260d76 100644
>>> --- a/include/linux/tpm.h
>>> +++ b/include/linux/tpm.h
>>> @@ -150,6 +150,7 @@ struct tpm_chip {
>>> 	bool timeout_adjusted;
>>> 	unsigned long duration[TPM_NUM_DURATIONS]; /* jiffies */
>>> 	bool duration_adjusted;
>>> +	unsigned long timeout_wait_stat; /* usecs */
>>> 
>>> 	struct dentry *bios_dir[TPM_NUM_EVENT_LOG_FILES];
>>> 
>>> @@ -269,6 +270,7 @@ enum tpm2_cc_attrs {
>>> #define TPM_VID_INTEL    0x8086
>>> #define TPM_VID_WINBOND  0x1050
>>> #define TPM_VID_STM      0x104A
>>> +#define TPM_VID_ATML     0x1114
>>> 
>>> enum tpm_chip_flags {
>>> 	TPM_CHIP_FLAG_TPM2		= BIT(1),
>>> -- 
>>> 2.29.0.vfs.0.0
>>> 
>>> 
>> 
>> /Jarkko

Hao


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH] tpm: fix ATMEL TPM crash caused by too frequent queries
  2021-07-02  7:33       ` Hao Wu
  2021-07-02  7:35         ` Hao Wu
@ 2021-07-02  7:45         ` Jarkko Sakkinen
  2021-07-02  7:59           ` Hao Wu
  1 sibling, 1 reply; 91+ messages in thread
From: Jarkko Sakkinen @ 2021-07-02  7:45 UTC (permalink / raw)
  To: Hao Wu
  Cc: Shrihari Kalkar, Seungyeop Han, Anish Jhaveri, peterhuewe, jgg,
	linux-integrity, Paul Menzel, Ken Goldman, zohar, why2jjj.linux,
	Hamza Attak, gregkh, arnd, Nayna, James Bottomley

On Fri, Jul 02, 2021 at 12:33:15AM -0700, Hao Wu wrote:
> 
> 
> > On Jul 1, 2021, at 11:35 PM, Jarkko Sakkinen <jarkko@kernel.org> wrote:
> > 
> > On Tue, Jun 29, 2021 at 09:22:05PM -0700, Hao Wu wrote:
> >> This is a fix for the ATMEL TPM crash bug reported in
> >> https://patchwork.kernel.org/project/linux-integrity/patch/20200926223150.109645-1-hao.wu@rubrik.com/
> >> 
> >> According to the discussions in the original thread,
> >> we don't want to revert the timeout of wait_for_tpm_stat
> >> for non-ATMEL chips, which brings back the performance cost.
> >> For investigation and analysis of why wait_for_tpm_stat
> >> caused the issue, and how the regression was introduced,
> >> please read the original thread above.
> >> 
> >> Thus the proposed fix here is to only revert the timeout
> >> for ATMEL chips by checking the vendor ID.
> >> 
> >> Signed-off-by: Hao Wu <hao.wu@rubrik.com>
> >> Fixes: 9f3fc7bcddcb ("tpm: replace msleep() with usleep_range() in TPM 1.2/2.0 generic drivers")
> > 
> > Fixes tag should be before SOB.
> > 
> >> ---
> >> Test Plan:
> >> - Run fixed kernel with ATMEL TPM chips and see crash
> >> has been fixed.
> >> - Run fixed kernel with non-ATMEL TPM chips, and confirm
> >> the timeout has not been changed.
> >> 
> >> drivers/char/tpm/tpm.h          |  9 ++++++++-
> >> drivers/char/tpm/tpm_tis_core.c | 19 +++++++++++++++++--
> >> include/linux/tpm.h             |  2 ++
> >> 3 files changed, 27 insertions(+), 3 deletions(-)
> >> 
> >> diff --git a/drivers/char/tpm/tpm.h b/drivers/char/tpm/tpm.h
> >> index 283f78211c3a..bc6aa7f9e119 100644
> >> --- a/drivers/char/tpm/tpm.h
> >> +++ b/drivers/char/tpm/tpm.h
> >> @@ -42,7 +42,9 @@ enum tpm_timeout {
> >> 	TPM_TIMEOUT_RANGE_US = 300,	/* usecs */
> >> 	TPM_TIMEOUT_POLL = 1,	/* msecs */
> >> 	TPM_TIMEOUT_USECS_MIN = 100,      /* usecs */
> >> -	TPM_TIMEOUT_USECS_MAX = 500      /* usecs */
> >> +	TPM_TIMEOUT_USECS_MAX = 500,	/* usecs */
> > 
> > What is this change?
> Need to add the tailing comma
> 
> > 
> >> +	TPM_TIMEOUT_WAIT_STAT = 500,	/* usecs */
> >> +	TPM_ATML_TIMEOUT_WAIT_STAT = 15000	/* usecs */
> >> };
> >> 
> >> /* TPM addresses */
> >> @@ -189,6 +191,11 @@ static inline void tpm_msleep(unsigned int delay_msec)
> >> 		     delay_msec * 1000);
> >> };
> >> 
> >> +static inline void tpm_usleep(unsigned int delay_usec)
> >> +{
> >> +	usleep_range(delay_usec - TPM_TIMEOUT_RANGE_US, delay_usec);
> >> +};
> > 
> > Please remove this, and open code.
> Ok, will do
> 
> >> +
> >> int tpm_chip_start(struct tpm_chip *chip);
> >> void tpm_chip_stop(struct tpm_chip *chip);
> >> struct tpm_chip *tpm_find_get_ops(struct tpm_chip *chip);
> >> diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
> >> index 55b9d3965ae1..9ddd4edfe1c2 100644
> >> --- a/drivers/char/tpm/tpm_tis_core.c
> >> +++ b/drivers/char/tpm/tpm_tis_core.c
> >> @@ -80,8 +80,12 @@ static int wait_for_tpm_stat(struct tpm_chip *chip, u8 mask,
> >> 		}
> >> 	} else {
> >> 		do {
> >> -			usleep_range(TPM_TIMEOUT_USECS_MIN,
> >> -				     TPM_TIMEOUT_USECS_MAX);
> >> +			if (chip->timeout_wait_stat && 
> >> +				chip->timeout_wait_stat >= TPM_TIMEOUT_WAIT_STAT) {
> >> +				tpm_usleep((unsigned int)(chip->timeout_wait_stat));
> >> +			} else {
> >> +				tpm_usleep((unsigned int)(TPM_TIMEOUT_WAIT_STAT));
> >> +			}
> > 
> > Invalid use of braces. Please read
> > 
> > https://www.kernel.org/doc/html/v5.13/process/coding-style.html
> > 
> > Why do you have to use this field conditionally anyway? Why doesn't
> > it always contain a legit value?
> The field is legit now, but doesn’t hurt to do addition check for robustness 
> to ensure no crash ? Just in case the value is updated below TPM_TIMEOUT_WAIT_STAT ? 
> 
> Can remove if we think it is not needed.

A simple question: why you use it conditionally? Can the field contain invalid value?

/Jarkko

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH] tpm: fix ATMEL TPM crash caused by too frequent queries
  2021-07-02  7:45         ` Jarkko Sakkinen
@ 2021-07-02  7:59           ` Hao Wu
  2021-07-02  8:42             ` Jarkko Sakkinen
  0 siblings, 1 reply; 91+ messages in thread
From: Hao Wu @ 2021-07-02  7:59 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: Shrihari Kalkar, Seungyeop Han, Anish Jhaveri, peterhuewe, jgg,
	linux-integrity, Paul Menzel, Ken Goldman, zohar, why2jjj.linux,
	Hamza Attak, gregkh, arnd, Nayna, James Bottomley

> On Jul 2, 2021, at 12:45 AM, Jarkko Sakkinen <jarkko@kernel.org> wrote:
> 
> On Fri, Jul 02, 2021 at 12:33:15AM -0700, Hao Wu wrote:
>> 
>> 
>>> On Jul 1, 2021, at 11:35 PM, Jarkko Sakkinen <jarkko@kernel.org> wrote:
>>> 
>>> On Tue, Jun 29, 2021 at 09:22:05PM -0700, Hao Wu wrote:
>>>> This is a fix for the ATMEL TPM crash bug reported in
>>>> https://patchwork.kernel.org/project/linux-integrity/patch/20200926223150.109645-1-hao.wu@rubrik.com/
>>>> 
>>>> According to the discussions in the original thread,
>>>> we don't want to revert the timeout of wait_for_tpm_stat
>>>> for non-ATMEL chips, which brings back the performance cost.
>>>> For investigation and analysis of why wait_for_tpm_stat
>>>> caused the issue, and how the regression was introduced,
>>>> please read the original thread above.
>>>> 
>>>> Thus the proposed fix here is to only revert the timeout
>>>> for ATMEL chips by checking the vendor ID.
>>>> 
>>>> Signed-off-by: Hao Wu <hao.wu@rubrik.com>
>>>> Fixes: 9f3fc7bcddcb ("tpm: replace msleep() with usleep_range() in TPM 1.2/2.0 generic drivers")
>>> 
>>> Fixes tag should be before SOB.
>>> 
>>>> ---
>>>> Test Plan:
>>>> - Run fixed kernel with ATMEL TPM chips and see crash
>>>> has been fixed.
>>>> - Run fixed kernel with non-ATMEL TPM chips, and confirm
>>>> the timeout has not been changed.
>>>> 
>>>> drivers/char/tpm/tpm.h          |  9 ++++++++-
>>>> drivers/char/tpm/tpm_tis_core.c | 19 +++++++++++++++++--
>>>> include/linux/tpm.h             |  2 ++
>>>> 3 files changed, 27 insertions(+), 3 deletions(-)
>>>> 
>>>> diff --git a/drivers/char/tpm/tpm.h b/drivers/char/tpm/tpm.h
>>>> index 283f78211c3a..bc6aa7f9e119 100644
>>>> --- a/drivers/char/tpm/tpm.h
>>>> +++ b/drivers/char/tpm/tpm.h
>>>> @@ -42,7 +42,9 @@ enum tpm_timeout {
>>>> 	TPM_TIMEOUT_RANGE_US = 300,	/* usecs */
>>>> 	TPM_TIMEOUT_POLL = 1,	/* msecs */
>>>> 	TPM_TIMEOUT_USECS_MIN = 100,      /* usecs */
>>>> -	TPM_TIMEOUT_USECS_MAX = 500      /* usecs */
>>>> +	TPM_TIMEOUT_USECS_MAX = 500,	/* usecs */
>>> 
>>> What is this change?
>> Need to add the tailing comma
>> 
>>> 
>>>> +	TPM_TIMEOUT_WAIT_STAT = 500,	/* usecs */
>>>> +	TPM_ATML_TIMEOUT_WAIT_STAT = 15000	/* usecs */
>>>> };
>>>> 
>>>> /* TPM addresses */
>>>> @@ -189,6 +191,11 @@ static inline void tpm_msleep(unsigned int delay_msec)
>>>> 		     delay_msec * 1000);
>>>> };
>>>> 
>>>> +static inline void tpm_usleep(unsigned int delay_usec)
>>>> +{
>>>> +	usleep_range(delay_usec - TPM_TIMEOUT_RANGE_US, delay_usec);
>>>> +};
>>> 
>>> Please remove this, and open code.
>> Ok, will do
>> 
>>>> +
>>>> int tpm_chip_start(struct tpm_chip *chip);
>>>> void tpm_chip_stop(struct tpm_chip *chip);
>>>> struct tpm_chip *tpm_find_get_ops(struct tpm_chip *chip);
>>>> diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
>>>> index 55b9d3965ae1..9ddd4edfe1c2 100644
>>>> --- a/drivers/char/tpm/tpm_tis_core.c
>>>> +++ b/drivers/char/tpm/tpm_tis_core.c
>>>> @@ -80,8 +80,12 @@ static int wait_for_tpm_stat(struct tpm_chip *chip, u8 mask,
>>>> 		}
>>>> 	} else {
>>>> 		do {
>>>> -			usleep_range(TPM_TIMEOUT_USECS_MIN,
>>>> -				     TPM_TIMEOUT_USECS_MAX);
>>>> +			if (chip->timeout_wait_stat && 
>>>> +				chip->timeout_wait_stat >= TPM_TIMEOUT_WAIT_STAT) {
>>>> +				tpm_usleep((unsigned int)(chip->timeout_wait_stat));
>>>> +			} else {
>>>> +				tpm_usleep((unsigned int)(TPM_TIMEOUT_WAIT_STAT));
>>>> +			}
>>> 
>>> Invalid use of braces. Please read
>>> 
>>> https://www.kernel.org/doc/html/v5.13/process/coding-style.html
>>> 
>>> Why do you have to use this field conditionally anyway? Why doesn't
>>> it always contain a legit value?
>> The field is legit now, but doesn’t hurt to do addition check for robustness 
>> to ensure no crash ? Just in case the value is updated below TPM_TIMEOUT_WAIT_STAT ? 
>> 
>> Can remove if we think it is not needed.
> 
> A simple question: why you use it conditionally? Can the field contain invalid value?
> 
There are two checks
- chip->timeout_wait_stat >= TPM_TIMEOUT_WAIT_STAT
It could be invalid when future developer set it to some value less than `TPM_TIMEOUT_USECS_MIN`,
and crash the usleep 

- chip->timeout_wait_stat
Yes this is needed, because this code path will be run even chip->timeout_wait_stat has not been initialized in tpm_tis_core_init
from my observation. Didn’t dig into how it is used though.  

> /Jarkko

Hao


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH] tpm: fix ATMEL TPM crash caused by too frequent queries
  2021-07-02  7:59           ` Hao Wu
@ 2021-07-02  8:42             ` Jarkko Sakkinen
  2021-07-02 11:57               ` Jarkko Sakkinen
  0 siblings, 1 reply; 91+ messages in thread
From: Jarkko Sakkinen @ 2021-07-02  8:42 UTC (permalink / raw)
  To: Hao Wu
  Cc: Shrihari Kalkar, Seungyeop Han, Anish Jhaveri, peterhuewe, jgg,
	linux-integrity, Paul Menzel, Ken Goldman, zohar, why2jjj.linux,
	Hamza Attak, gregkh, arnd, Nayna, James Bottomley

On Fri, Jul 02, 2021 at 12:59:18AM -0700, Hao Wu wrote:
> > On Jul 2, 2021, at 12:45 AM, Jarkko Sakkinen <jarkko@kernel.org> wrote:
> > 
> > On Fri, Jul 02, 2021 at 12:33:15AM -0700, Hao Wu wrote:
> >> 
> >> 
> >>> On Jul 1, 2021, at 11:35 PM, Jarkko Sakkinen <jarkko@kernel.org> wrote:
> >>> 
> >>> On Tue, Jun 29, 2021 at 09:22:05PM -0700, Hao Wu wrote:
> >>>> This is a fix for the ATMEL TPM crash bug reported in
> >>>> https://patchwork.kernel.org/project/linux-integrity/patch/20200926223150.109645-1-hao.wu@rubrik.com/
> >>>> 
> >>>> According to the discussions in the original thread,
> >>>> we don't want to revert the timeout of wait_for_tpm_stat
> >>>> for non-ATMEL chips, which brings back the performance cost.
> >>>> For investigation and analysis of why wait_for_tpm_stat
> >>>> caused the issue, and how the regression was introduced,
> >>>> please read the original thread above.
> >>>> 
> >>>> Thus the proposed fix here is to only revert the timeout
> >>>> for ATMEL chips by checking the vendor ID.
> >>>> 
> >>>> Signed-off-by: Hao Wu <hao.wu@rubrik.com>
> >>>> Fixes: 9f3fc7bcddcb ("tpm: replace msleep() with usleep_range() in TPM 1.2/2.0 generic drivers")
> >>> 
> >>> Fixes tag should be before SOB.
> >>> 
> >>>> ---
> >>>> Test Plan:
> >>>> - Run fixed kernel with ATMEL TPM chips and see crash
> >>>> has been fixed.
> >>>> - Run fixed kernel with non-ATMEL TPM chips, and confirm
> >>>> the timeout has not been changed.
> >>>> 
> >>>> drivers/char/tpm/tpm.h          |  9 ++++++++-
> >>>> drivers/char/tpm/tpm_tis_core.c | 19 +++++++++++++++++--
> >>>> include/linux/tpm.h             |  2 ++
> >>>> 3 files changed, 27 insertions(+), 3 deletions(-)
> >>>> 
> >>>> diff --git a/drivers/char/tpm/tpm.h b/drivers/char/tpm/tpm.h
> >>>> index 283f78211c3a..bc6aa7f9e119 100644
> >>>> --- a/drivers/char/tpm/tpm.h
> >>>> +++ b/drivers/char/tpm/tpm.h
> >>>> @@ -42,7 +42,9 @@ enum tpm_timeout {
> >>>> 	TPM_TIMEOUT_RANGE_US = 300,	/* usecs */
> >>>> 	TPM_TIMEOUT_POLL = 1,	/* msecs */
> >>>> 	TPM_TIMEOUT_USECS_MIN = 100,      /* usecs */
> >>>> -	TPM_TIMEOUT_USECS_MAX = 500      /* usecs */
> >>>> +	TPM_TIMEOUT_USECS_MAX = 500,	/* usecs */
> >>> 
> >>> What is this change?
> >> Need to add the tailing comma
> >> 
> >>> 
> >>>> +	TPM_TIMEOUT_WAIT_STAT = 500,	/* usecs */
> >>>> +	TPM_ATML_TIMEOUT_WAIT_STAT = 15000	/* usecs */
> >>>> };
> >>>> 
> >>>> /* TPM addresses */
> >>>> @@ -189,6 +191,11 @@ static inline void tpm_msleep(unsigned int delay_msec)
> >>>> 		     delay_msec * 1000);
> >>>> };
> >>>> 
> >>>> +static inline void tpm_usleep(unsigned int delay_usec)
> >>>> +{
> >>>> +	usleep_range(delay_usec - TPM_TIMEOUT_RANGE_US, delay_usec);
> >>>> +};
> >>> 
> >>> Please remove this, and open code.
> >> Ok, will do
> >> 
> >>>> +
> >>>> int tpm_chip_start(struct tpm_chip *chip);
> >>>> void tpm_chip_stop(struct tpm_chip *chip);
> >>>> struct tpm_chip *tpm_find_get_ops(struct tpm_chip *chip);
> >>>> diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
> >>>> index 55b9d3965ae1..9ddd4edfe1c2 100644
> >>>> --- a/drivers/char/tpm/tpm_tis_core.c
> >>>> +++ b/drivers/char/tpm/tpm_tis_core.c
> >>>> @@ -80,8 +80,12 @@ static int wait_for_tpm_stat(struct tpm_chip *chip, u8 mask,
> >>>> 		}
> >>>> 	} else {
> >>>> 		do {
> >>>> -			usleep_range(TPM_TIMEOUT_USECS_MIN,
> >>>> -				     TPM_TIMEOUT_USECS_MAX);
> >>>> +			if (chip->timeout_wait_stat && 
> >>>> +				chip->timeout_wait_stat >= TPM_TIMEOUT_WAIT_STAT) {
> >>>> +				tpm_usleep((unsigned int)(chip->timeout_wait_stat));
> >>>> +			} else {
> >>>> +				tpm_usleep((unsigned int)(TPM_TIMEOUT_WAIT_STAT));
> >>>> +			}
> >>> 
> >>> Invalid use of braces. Please read
> >>> 
> >>> https://www.kernel.org/doc/html/v5.13/process/coding-style.html
> >>> 
> >>> Why do you have to use this field conditionally anyway? Why doesn't
> >>> it always contain a legit value?
> >> The field is legit now, but doesn’t hurt to do addition check for robustness 
> >> to ensure no crash ? Just in case the value is updated below TPM_TIMEOUT_WAIT_STAT ? 
> >> 
> >> Can remove if we think it is not needed.
> > 
> > A simple question: why you use it conditionally? Can the field contain invalid value?
> > 
> There are two checks
> - chip->timeout_wait_stat >= TPM_TIMEOUT_WAIT_STAT
> It could be invalid when future developer set it to some value less than `TPM_TIMEOUT_USECS_MIN`,
> and crash the usleep 

I don't understand this. Why you don't set to appropriate value?

/Jarkko

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH] tpm: fix ATMEL TPM crash caused by too frequent queries
  2021-07-02  8:42             ` Jarkko Sakkinen
@ 2021-07-02 11:57               ` Jarkko Sakkinen
  2021-07-02 19:16                 ` Hao Wu
  0 siblings, 1 reply; 91+ messages in thread
From: Jarkko Sakkinen @ 2021-07-02 11:57 UTC (permalink / raw)
  To: Hao Wu
  Cc: Shrihari Kalkar, Seungyeop Han, Anish Jhaveri, peterhuewe, jgg,
	linux-integrity, Paul Menzel, Ken Goldman, zohar, why2jjj.linux,
	Hamza Attak, gregkh, arnd, Nayna, James Bottomley

On Fri, Jul 02, 2021 at 11:42:39AM +0300, Jarkko Sakkinen wrote:
> On Fri, Jul 02, 2021 at 12:59:18AM -0700, Hao Wu wrote:
> > > On Jul 2, 2021, at 12:45 AM, Jarkko Sakkinen <jarkko@kernel.org> wrote:
> > > 
> > > On Fri, Jul 02, 2021 at 12:33:15AM -0700, Hao Wu wrote:
> > >> 
> > >> 
> > >>> On Jul 1, 2021, at 11:35 PM, Jarkko Sakkinen <jarkko@kernel.org> wrote:
> > >>> 
> > >>> On Tue, Jun 29, 2021 at 09:22:05PM -0700, Hao Wu wrote:
> > >>>> This is a fix for the ATMEL TPM crash bug reported in
> > >>>> https://patchwork.kernel.org/project/linux-integrity/patch/20200926223150.109645-1-hao.wu@rubrik.com/
> > >>>> 
> > >>>> According to the discussions in the original thread,
> > >>>> we don't want to revert the timeout of wait_for_tpm_stat
> > >>>> for non-ATMEL chips, which brings back the performance cost.
> > >>>> For investigation and analysis of why wait_for_tpm_stat
> > >>>> caused the issue, and how the regression was introduced,
> > >>>> please read the original thread above.
> > >>>> 
> > >>>> Thus the proposed fix here is to only revert the timeout
> > >>>> for ATMEL chips by checking the vendor ID.
> > >>>> 
> > >>>> Signed-off-by: Hao Wu <hao.wu@rubrik.com>
> > >>>> Fixes: 9f3fc7bcddcb ("tpm: replace msleep() with usleep_range() in TPM 1.2/2.0 generic drivers")
> > >>> 
> > >>> Fixes tag should be before SOB.
> > >>> 
> > >>>> ---
> > >>>> Test Plan:
> > >>>> - Run fixed kernel with ATMEL TPM chips and see crash
> > >>>> has been fixed.
> > >>>> - Run fixed kernel with non-ATMEL TPM chips, and confirm
> > >>>> the timeout has not been changed.
> > >>>> 
> > >>>> drivers/char/tpm/tpm.h          |  9 ++++++++-
> > >>>> drivers/char/tpm/tpm_tis_core.c | 19 +++++++++++++++++--
> > >>>> include/linux/tpm.h             |  2 ++
> > >>>> 3 files changed, 27 insertions(+), 3 deletions(-)
> > >>>> 
> > >>>> diff --git a/drivers/char/tpm/tpm.h b/drivers/char/tpm/tpm.h
> > >>>> index 283f78211c3a..bc6aa7f9e119 100644
> > >>>> --- a/drivers/char/tpm/tpm.h
> > >>>> +++ b/drivers/char/tpm/tpm.h
> > >>>> @@ -42,7 +42,9 @@ enum tpm_timeout {
> > >>>> 	TPM_TIMEOUT_RANGE_US = 300,	/* usecs */
> > >>>> 	TPM_TIMEOUT_POLL = 1,	/* msecs */
> > >>>> 	TPM_TIMEOUT_USECS_MIN = 100,      /* usecs */
> > >>>> -	TPM_TIMEOUT_USECS_MAX = 500      /* usecs */
> > >>>> +	TPM_TIMEOUT_USECS_MAX = 500,	/* usecs */
> > >>> 
> > >>> What is this change?
> > >> Need to add the tailing comma
> > >> 
> > >>> 
> > >>>> +	TPM_TIMEOUT_WAIT_STAT = 500,	/* usecs */
> > >>>> +	TPM_ATML_TIMEOUT_WAIT_STAT = 15000	/* usecs */
> > >>>> };
> > >>>> 
> > >>>> /* TPM addresses */
> > >>>> @@ -189,6 +191,11 @@ static inline void tpm_msleep(unsigned int delay_msec)
> > >>>> 		     delay_msec * 1000);
> > >>>> };
> > >>>> 
> > >>>> +static inline void tpm_usleep(unsigned int delay_usec)
> > >>>> +{
> > >>>> +	usleep_range(delay_usec - TPM_TIMEOUT_RANGE_US, delay_usec);
> > >>>> +};
> > >>> 
> > >>> Please remove this, and open code.
> > >> Ok, will do
> > >> 
> > >>>> +
> > >>>> int tpm_chip_start(struct tpm_chip *chip);
> > >>>> void tpm_chip_stop(struct tpm_chip *chip);
> > >>>> struct tpm_chip *tpm_find_get_ops(struct tpm_chip *chip);
> > >>>> diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
> > >>>> index 55b9d3965ae1..9ddd4edfe1c2 100644
> > >>>> --- a/drivers/char/tpm/tpm_tis_core.c
> > >>>> +++ b/drivers/char/tpm/tpm_tis_core.c
> > >>>> @@ -80,8 +80,12 @@ static int wait_for_tpm_stat(struct tpm_chip *chip, u8 mask,
> > >>>> 		}
> > >>>> 	} else {
> > >>>> 		do {
> > >>>> -			usleep_range(TPM_TIMEOUT_USECS_MIN,
> > >>>> -				     TPM_TIMEOUT_USECS_MAX);
> > >>>> +			if (chip->timeout_wait_stat && 
> > >>>> +				chip->timeout_wait_stat >= TPM_TIMEOUT_WAIT_STAT) {
> > >>>> +				tpm_usleep((unsigned int)(chip->timeout_wait_stat));
> > >>>> +			} else {
> > >>>> +				tpm_usleep((unsigned int)(TPM_TIMEOUT_WAIT_STAT));
> > >>>> +			}
> > >>> 
> > >>> Invalid use of braces. Please read
> > >>> 
> > >>> https://www.kernel.org/doc/html/v5.13/process/coding-style.html
> > >>> 
> > >>> Why do you have to use this field conditionally anyway? Why doesn't
> > >>> it always contain a legit value?
> > >> The field is legit now, but doesn’t hurt to do addition check for robustness 
> > >> to ensure no crash ? Just in case the value is updated below TPM_TIMEOUT_WAIT_STAT ? 
> > >> 
> > >> Can remove if we think it is not needed.
> > > 
> > > A simple question: why you use it conditionally? Can the field contain invalid value?
> > > 
> > There are two checks
> > - chip->timeout_wait_stat >= TPM_TIMEOUT_WAIT_STAT
> > It could be invalid when future developer set it to some value less than `TPM_TIMEOUT_USECS_MIN`,
> > and crash the usleep 
> 
> I don't understand this. Why you don't set to appropriate value?

What you should do, is to define two fields:

- tpm_timeout_min
- tpm_timeout_max

And initialize these to TPM_TIMEOUT_USECS_MIN and TPM_TIMEOUT_USECS_MAX.

Then fixup those for Atmel (with a simple if-statement, switch-case is
overkill).

The way you work out things right now is broken:

1. Before for non-Atmel: usleep_range(100, 500)
2. After for non-Atmel: usleep_range(200, 500)

I.e. the patch changes code semantically that it should not touch in the
first place.

/Jarkko

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH] tpm: fix ATMEL TPM crash caused by too frequent queries
  2021-07-02 11:57               ` Jarkko Sakkinen
@ 2021-07-02 19:16                 ` Hao Wu
  2021-07-05  5:19                   ` Jarkko Sakkinen
  0 siblings, 1 reply; 91+ messages in thread
From: Hao Wu @ 2021-07-02 19:16 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: Shrihari Kalkar, Seungyeop Han, Anish Jhaveri, peterhuewe, jgg,
	linux-integrity, Paul Menzel, Ken Goldman, zohar, why2jjj.linux,
	Hamza Attak, gregkh, arnd, Nayna, James Bottomley


> On Jul 2, 2021, at 4:57 AM, Jarkko Sakkinen <jarkko@kernel.org> wrote:
> 
> On Fri, Jul 02, 2021 at 11:42:39AM +0300, Jarkko Sakkinen wrote:
>> On Fri, Jul 02, 2021 at 12:59:18AM -0700, Hao Wu wrote:
>>>> On Jul 2, 2021, at 12:45 AM, Jarkko Sakkinen <jarkko@kernel.org> wrote:
>>>> 
>>>> On Fri, Jul 02, 2021 at 12:33:15AM -0700, Hao Wu wrote:
>>>>> 
>>>>> 
>>>>>> On Jul 1, 2021, at 11:35 PM, Jarkko Sakkinen <jarkko@kernel.org> wrote:
>>>>>> 
>>>>>> On Tue, Jun 29, 2021 at 09:22:05PM -0700, Hao Wu wrote:
>>>>>>> This is a fix for the ATMEL TPM crash bug reported in
>>>>>>> https://patchwork.kernel.org/project/linux-integrity/patch/20200926223150.109645-1-hao.wu@rubrik.com/
>>>>>>> 
>>>>>>> According to the discussions in the original thread,
>>>>>>> we don't want to revert the timeout of wait_for_tpm_stat
>>>>>>> for non-ATMEL chips, which brings back the performance cost.
>>>>>>> For investigation and analysis of why wait_for_tpm_stat
>>>>>>> caused the issue, and how the regression was introduced,
>>>>>>> please read the original thread above.
>>>>>>> 
>>>>>>> Thus the proposed fix here is to only revert the timeout
>>>>>>> for ATMEL chips by checking the vendor ID.
>>>>>>> 
>>>>>>> Signed-off-by: Hao Wu <hao.wu@rubrik.com>
>>>>>>> Fixes: 9f3fc7bcddcb ("tpm: replace msleep() with usleep_range() in TPM 1.2/2.0 generic drivers")
>>>>>> 
>>>>>> Fixes tag should be before SOB.
>>>>>> 
>>>>>>> ---
>>>>>>> Test Plan:
>>>>>>> - Run fixed kernel with ATMEL TPM chips and see crash
>>>>>>> has been fixed.
>>>>>>> - Run fixed kernel with non-ATMEL TPM chips, and confirm
>>>>>>> the timeout has not been changed.
>>>>>>> 
>>>>>>> drivers/char/tpm/tpm.h          |  9 ++++++++-
>>>>>>> drivers/char/tpm/tpm_tis_core.c | 19 +++++++++++++++++--
>>>>>>> include/linux/tpm.h             |  2 ++
>>>>>>> 3 files changed, 27 insertions(+), 3 deletions(-)
>>>>>>> 
>>>>>>> diff --git a/drivers/char/tpm/tpm.h b/drivers/char/tpm/tpm.h
>>>>>>> index 283f78211c3a..bc6aa7f9e119 100644
>>>>>>> --- a/drivers/char/tpm/tpm.h
>>>>>>> +++ b/drivers/char/tpm/tpm.h
>>>>>>> @@ -42,7 +42,9 @@ enum tpm_timeout {
>>>>>>> 	TPM_TIMEOUT_RANGE_US = 300,	/* usecs */
>>>>>>> 	TPM_TIMEOUT_POLL = 1,	/* msecs */
>>>>>>> 	TPM_TIMEOUT_USECS_MIN = 100,      /* usecs */
>>>>>>> -	TPM_TIMEOUT_USECS_MAX = 500      /* usecs */
>>>>>>> +	TPM_TIMEOUT_USECS_MAX = 500,	/* usecs */
>>>>>> 
>>>>>> What is this change?
>>>>> Need to add the tailing comma
>>>>> 
>>>>>> 
>>>>>>> +	TPM_TIMEOUT_WAIT_STAT = 500,	/* usecs */
>>>>>>> +	TPM_ATML_TIMEOUT_WAIT_STAT = 15000	/* usecs */
>>>>>>> };
>>>>>>> 
>>>>>>> /* TPM addresses */
>>>>>>> @@ -189,6 +191,11 @@ static inline void tpm_msleep(unsigned int delay_msec)
>>>>>>> 		     delay_msec * 1000);
>>>>>>> };
>>>>>>> 
>>>>>>> +static inline void tpm_usleep(unsigned int delay_usec)
>>>>>>> +{
>>>>>>> +	usleep_range(delay_usec - TPM_TIMEOUT_RANGE_US, delay_usec);
>>>>>>> +};
>>>>>> 
>>>>>> Please remove this, and open code.
>>>>> Ok, will do
>>>>> 
>>>>>>> +
>>>>>>> int tpm_chip_start(struct tpm_chip *chip);
>>>>>>> void tpm_chip_stop(struct tpm_chip *chip);
>>>>>>> struct tpm_chip *tpm_find_get_ops(struct tpm_chip *chip);
>>>>>>> diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
>>>>>>> index 55b9d3965ae1..9ddd4edfe1c2 100644
>>>>>>> --- a/drivers/char/tpm/tpm_tis_core.c
>>>>>>> +++ b/drivers/char/tpm/tpm_tis_core.c
>>>>>>> @@ -80,8 +80,12 @@ static int wait_for_tpm_stat(struct tpm_chip *chip, u8 mask,
>>>>>>> 		}
>>>>>>> 	} else {
>>>>>>> 		do {
>>>>>>> -			usleep_range(TPM_TIMEOUT_USECS_MIN,
>>>>>>> -				     TPM_TIMEOUT_USECS_MAX);
>>>>>>> +			if (chip->timeout_wait_stat && 
>>>>>>> +				chip->timeout_wait_stat >= TPM_TIMEOUT_WAIT_STAT) {
>>>>>>> +				tpm_usleep((unsigned int)(chip->timeout_wait_stat));
>>>>>>> +			} else {
>>>>>>> +				tpm_usleep((unsigned int)(TPM_TIMEOUT_WAIT_STAT));
>>>>>>> +			}
>>>>>> 
>>>>>> Invalid use of braces. Please read
>>>>>> 
>>>>>> https://www.kernel.org/doc/html/v5.13/process/coding-style.html
>>>>>> 
>>>>>> Why do you have to use this field conditionally anyway? Why doesn't
>>>>>> it always contain a legit value?
>>>>> The field is legit now, but doesn’t hurt to do addition check for robustness 
>>>>> to ensure no crash ? Just in case the value is updated below TPM_TIMEOUT_WAIT_STAT ? 
>>>>> 
>>>>> Can remove if we think it is not needed.
>>>> 
>>>> A simple question: why you use it conditionally? Can the field contain invalid value?
>>>> 
>>> There are two checks
>>> - chip->timeout_wait_stat >= TPM_TIMEOUT_WAIT_STAT
>>> It could be invalid when future developer set it to some value less than `TPM_TIMEOUT_USECS_MIN`,
>>> and crash the usleep 
>> 
>> I don't understand this. Why you don't set to appropriate value?
Ok, fair enough, I assume developers will test it anyway to ensure no crash. Will remove this check.

> What you should do, is to define two fields:
> 
> - tpm_timeout_min
> - tpm_timeout_max
> 
> And initialize these to TPM_TIMEOUT_USECS_MIN and TPM_TIMEOUT_USECS_MAX.
> 
> Then fixup those for Atmel (with a simple if-statement, switch-case is
> overkill).
Switch was more for extensibility when other vendor has similar issue,
but we can refactor when needed in the future. I can use if-statement for now.

> The way you work out things right now is broken:
> 
> 1. Before for non-Atmel: usleep_range(100, 500)
> 2. After for non-Atmel: usleep_range(200, 500)
I realized this in day-1, I think this range change does not matter much.
`TPM_TIMEOUT_RANGE_US=300` is already used in the codebase, I assume people define
such if for general use cases for usleep_range in TPM
But we can add two fields if that makes us more comfortable to strictly follow the current code
semantically.

> I.e. the patch changes code semantically that it should not touch in the
> first place.
> 
> /Jarkko
Hao


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH] tpm: fix ATMEL TPM crash caused by too frequent queries
  2021-06-30  4:22   ` [PATCH] tpm: fix ATMEL " Hao Wu
  2021-07-02  6:35     ` Jarkko Sakkinen
@ 2021-07-04  0:07     ` Hao Wu
  2021-07-05  7:15       ` Jarkko Sakkinen
  2021-07-07  4:31     ` [PATCH v2] " Hao Wu
  2021-07-09  4:40     ` [PATCH v2] tpm: fix Atmel " Hao Wu
  3 siblings, 1 reply; 91+ messages in thread
From: Hao Wu @ 2021-07-04  0:07 UTC (permalink / raw)
  To: hao.wu, shrihari.kalkar, seungyeop.han, anish.jhaveri,
	peterhuewe, jarkko, jgg, linux-integrity, pmenzel, kgold, zohar,
	why2jjj.linux, hamza, gregkh, arnd, nayna, James.Bottomley

This is a fix for the ATMEL TPM crash bug reported in
https://patchwork.kernel.org/project/linux-integrity/patch/20200926223150.109645-1-hao.wu@rubrik.com/

According to the discussions in the original thread,
we don't want to revert the timeout of wait_for_tpm_stat
for non-ATMEL chips, which brings back the performance cost.
For investigation and analysis of why wait_for_tpm_stat
caused the issue, and how the regression was introduced,
please read the original thread above.

Thus the proposed fix here is to only revert the timeout
for ATMEL chips by checking the vendor ID.

Fixes: 9f3fc7bcddcb ("tpm: replace msleep() with usleep_range() in TPM 1.2/2.0 generic drivers")
Signed-off-by: Hao Wu <hao.wu@rubrik.com>
---
Test Plan:
- Run fixed kernel with ATMEL TPM chips and see crash
has been fixed.
- Run fixed kernel with non-ATMEL TPM chips, and confirm
the timeout has not been changed.

 drivers/char/tpm/tpm.h          |  6 ++++--
 drivers/char/tpm/tpm_tis_core.c | 23 +++++++++++++++++++++--
 include/linux/tpm.h             |  3 +++
 3 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/drivers/char/tpm/tpm.h b/drivers/char/tpm/tpm.h
index 283f78211c3a..6de1b44c4aab 100644
--- a/drivers/char/tpm/tpm.h
+++ b/drivers/char/tpm/tpm.h
@@ -41,8 +41,10 @@ enum tpm_timeout {
 	TPM_TIMEOUT_RETRY = 100, /* msecs */
 	TPM_TIMEOUT_RANGE_US = 300,	/* usecs */
 	TPM_TIMEOUT_POLL = 1,	/* msecs */
-	TPM_TIMEOUT_USECS_MIN = 100,      /* usecs */
-	TPM_TIMEOUT_USECS_MAX = 500      /* usecs */
+	TPM_TIMEOUT_USECS_MIN = 100,	/* usecs */
+	TPM_TIMEOUT_USECS_MAX = 500,	/* usecs */
+	TPM_ATML_TIMEOUT_WAIT_STAT_MIN = 14700,	/* usecs */
+	TPM_ATML_TIMEOUT_WAIT_STAT_MAX = 15000	/* usecs */
 };
 
 /* TPM addresses */
diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
index 55b9d3965ae1..ae27d66fdd94 100644
--- a/drivers/char/tpm/tpm_tis_core.c
+++ b/drivers/char/tpm/tpm_tis_core.c
@@ -80,8 +80,17 @@ static int wait_for_tpm_stat(struct tpm_chip *chip, u8 mask,
 		}
 	} else {
 		do {
-			usleep_range(TPM_TIMEOUT_USECS_MIN,
-				     TPM_TIMEOUT_USECS_MAX);
+			/* this code path could be executed before
+			 * timeouts initialized in chip instance.
+			 */
+			if (chip->timeout_wait_stat_min &&
+			    chip->timeout_wait_stat_max)
+				usleep_range(chip->timeout_wait_stat_min,
+					     chip->timeout_wait_stat_max);
+			else
+				usleep_range(TPM_TIMEOUT_USECS_MIN,
+					     TPM_TIMEOUT_USECS_MAX);
+
 			status = chip->ops->status(chip);
 			if ((status & mask) == mask)
 				return 0;
@@ -934,6 +943,9 @@ int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq,
 	chip->timeout_b = msecs_to_jiffies(TIS_TIMEOUT_B_MAX);
 	chip->timeout_c = msecs_to_jiffies(TIS_TIMEOUT_C_MAX);
 	chip->timeout_d = msecs_to_jiffies(TIS_TIMEOUT_D_MAX);
+	/* init timeouts for wait_for_tpm_stat */
+	chip->timeout_wait_stat_min = TPM_TIMEOUT_USECS_MIN;
+	chip->timeout_wait_stat_max = TPM_TIMEOUT_USECS_MAX;
 	priv->phy_ops = phy_ops;
 	dev_set_drvdata(&chip->dev, priv);
 
@@ -983,6 +995,13 @@ int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq,
 
 	priv->manufacturer_id = vendor;
 
+	if (priv->manufacturer_id == TPM_VID_ATML &&
+		!(chip->flags & TPM_CHIP_FLAG_TPM2)) {
+		/* If TPM chip is 1.2 ATMEL chip, timeout need to be relaxed*/
+		chip->timeout_wait_stat_min = TPM_ATML_TIMEOUT_WAIT_STAT_MIN;
+		chip->timeout_wait_stat_max = TPM_ATML_TIMEOUT_WAIT_STAT_MAX;
+	}
+
 	rc = tpm_tis_read8(priv, TPM_RID(0), &rid);
 	if (rc < 0)
 		goto out_err;
diff --git a/include/linux/tpm.h b/include/linux/tpm.h
index aa11fe323c56..171b9102c976 100644
--- a/include/linux/tpm.h
+++ b/include/linux/tpm.h
@@ -150,6 +150,8 @@ struct tpm_chip {
 	bool timeout_adjusted;
 	unsigned long duration[TPM_NUM_DURATIONS]; /* jiffies */
 	bool duration_adjusted;
+	unsigned int timeout_wait_stat_min; /* usecs */
+	unsigned int timeout_wait_stat_max; /* usecs */
 
 	struct dentry *bios_dir[TPM_NUM_EVENT_LOG_FILES];
 
@@ -269,6 +271,7 @@ enum tpm2_cc_attrs {
 #define TPM_VID_INTEL    0x8086
 #define TPM_VID_WINBOND  0x1050
 #define TPM_VID_STM      0x104A
+#define TPM_VID_ATML     0x1114
 
 enum tpm_chip_flags {
 	TPM_CHIP_FLAG_TPM2		= BIT(1),
-- 
2.29.0.vfs.0.0


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH] tpm: fix ATMEL TPM crash caused by too frequent queries
  2021-07-02 19:16                 ` Hao Wu
@ 2021-07-05  5:19                   ` Jarkko Sakkinen
  2021-07-05  5:29                     ` Hao Wu
  0 siblings, 1 reply; 91+ messages in thread
From: Jarkko Sakkinen @ 2021-07-05  5:19 UTC (permalink / raw)
  To: Hao Wu
  Cc: Shrihari Kalkar, Seungyeop Han, Anish Jhaveri, peterhuewe, jgg,
	linux-integrity, Paul Menzel, Ken Goldman, zohar, why2jjj.linux,
	Hamza Attak, gregkh, arnd, Nayna, James Bottomley

On Fri, Jul 02, 2021 at 12:16:12PM -0700, Hao Wu wrote:
> 
> > On Jul 2, 2021, at 4:57 AM, Jarkko Sakkinen <jarkko@kernel.org> wrote:
> > 
> > On Fri, Jul 02, 2021 at 11:42:39AM +0300, Jarkko Sakkinen wrote:
> >> On Fri, Jul 02, 2021 at 12:59:18AM -0700, Hao Wu wrote:
> >>>> On Jul 2, 2021, at 12:45 AM, Jarkko Sakkinen <jarkko@kernel.org> wrote:
> >>>> 
> >>>> On Fri, Jul 02, 2021 at 12:33:15AM -0700, Hao Wu wrote:
> >>>>> 
> >>>>> 
> >>>>>> On Jul 1, 2021, at 11:35 PM, Jarkko Sakkinen <jarkko@kernel.org> wrote:
> >>>>>> 
> >>>>>> On Tue, Jun 29, 2021 at 09:22:05PM -0700, Hao Wu wrote:
> >>>>>>> This is a fix for the ATMEL TPM crash bug reported in
> >>>>>>> https://patchwork.kernel.org/project/linux-integrity/patch/20200926223150.109645-1-hao.wu@rubrik.com/
> >>>>>>> 
> >>>>>>> According to the discussions in the original thread,
> >>>>>>> we don't want to revert the timeout of wait_for_tpm_stat
> >>>>>>> for non-ATMEL chips, which brings back the performance cost.
> >>>>>>> For investigation and analysis of why wait_for_tpm_stat
> >>>>>>> caused the issue, and how the regression was introduced,
> >>>>>>> please read the original thread above.
> >>>>>>> 
> >>>>>>> Thus the proposed fix here is to only revert the timeout
> >>>>>>> for ATMEL chips by checking the vendor ID.
> >>>>>>> 
> >>>>>>> Signed-off-by: Hao Wu <hao.wu@rubrik.com>
> >>>>>>> Fixes: 9f3fc7bcddcb ("tpm: replace msleep() with usleep_range() in TPM 1.2/2.0 generic drivers")
> >>>>>> 
> >>>>>> Fixes tag should be before SOB.
> >>>>>> 
> >>>>>>> ---
> >>>>>>> Test Plan:
> >>>>>>> - Run fixed kernel with ATMEL TPM chips and see crash
> >>>>>>> has been fixed.
> >>>>>>> - Run fixed kernel with non-ATMEL TPM chips, and confirm
> >>>>>>> the timeout has not been changed.
> >>>>>>> 
> >>>>>>> drivers/char/tpm/tpm.h          |  9 ++++++++-
> >>>>>>> drivers/char/tpm/tpm_tis_core.c | 19 +++++++++++++++++--
> >>>>>>> include/linux/tpm.h             |  2 ++
> >>>>>>> 3 files changed, 27 insertions(+), 3 deletions(-)
> >>>>>>> 
> >>>>>>> diff --git a/drivers/char/tpm/tpm.h b/drivers/char/tpm/tpm.h
> >>>>>>> index 283f78211c3a..bc6aa7f9e119 100644
> >>>>>>> --- a/drivers/char/tpm/tpm.h
> >>>>>>> +++ b/drivers/char/tpm/tpm.h
> >>>>>>> @@ -42,7 +42,9 @@ enum tpm_timeout {
> >>>>>>> 	TPM_TIMEOUT_RANGE_US = 300,	/* usecs */
> >>>>>>> 	TPM_TIMEOUT_POLL = 1,	/* msecs */
> >>>>>>> 	TPM_TIMEOUT_USECS_MIN = 100,      /* usecs */
> >>>>>>> -	TPM_TIMEOUT_USECS_MAX = 500      /* usecs */
> >>>>>>> +	TPM_TIMEOUT_USECS_MAX = 500,	/* usecs */
> >>>>>> 
> >>>>>> What is this change?
> >>>>> Need to add the tailing comma
> >>>>> 
> >>>>>> 
> >>>>>>> +	TPM_TIMEOUT_WAIT_STAT = 500,	/* usecs */
> >>>>>>> +	TPM_ATML_TIMEOUT_WAIT_STAT = 15000	/* usecs */
> >>>>>>> };
> >>>>>>> 
> >>>>>>> /* TPM addresses */
> >>>>>>> @@ -189,6 +191,11 @@ static inline void tpm_msleep(unsigned int delay_msec)
> >>>>>>> 		     delay_msec * 1000);
> >>>>>>> };
> >>>>>>> 
> >>>>>>> +static inline void tpm_usleep(unsigned int delay_usec)
> >>>>>>> +{
> >>>>>>> +	usleep_range(delay_usec - TPM_TIMEOUT_RANGE_US, delay_usec);
> >>>>>>> +};
> >>>>>> 
> >>>>>> Please remove this, and open code.
> >>>>> Ok, will do
> >>>>> 
> >>>>>>> +
> >>>>>>> int tpm_chip_start(struct tpm_chip *chip);
> >>>>>>> void tpm_chip_stop(struct tpm_chip *chip);
> >>>>>>> struct tpm_chip *tpm_find_get_ops(struct tpm_chip *chip);
> >>>>>>> diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
> >>>>>>> index 55b9d3965ae1..9ddd4edfe1c2 100644
> >>>>>>> --- a/drivers/char/tpm/tpm_tis_core.c
> >>>>>>> +++ b/drivers/char/tpm/tpm_tis_core.c
> >>>>>>> @@ -80,8 +80,12 @@ static int wait_for_tpm_stat(struct tpm_chip *chip, u8 mask,
> >>>>>>> 		}
> >>>>>>> 	} else {
> >>>>>>> 		do {
> >>>>>>> -			usleep_range(TPM_TIMEOUT_USECS_MIN,
> >>>>>>> -				     TPM_TIMEOUT_USECS_MAX);
> >>>>>>> +			if (chip->timeout_wait_stat && 
> >>>>>>> +				chip->timeout_wait_stat >= TPM_TIMEOUT_WAIT_STAT) {
> >>>>>>> +				tpm_usleep((unsigned int)(chip->timeout_wait_stat));
> >>>>>>> +			} else {
> >>>>>>> +				tpm_usleep((unsigned int)(TPM_TIMEOUT_WAIT_STAT));
> >>>>>>> +			}
> >>>>>> 
> >>>>>> Invalid use of braces. Please read
> >>>>>> 
> >>>>>> https://www.kernel.org/doc/html/v5.13/process/coding-style.html
> >>>>>> 
> >>>>>> Why do you have to use this field conditionally anyway? Why doesn't
> >>>>>> it always contain a legit value?
> >>>>> The field is legit now, but doesn’t hurt to do addition check for robustness 
> >>>>> to ensure no crash ? Just in case the value is updated below TPM_TIMEOUT_WAIT_STAT ? 
> >>>>> 
> >>>>> Can remove if we think it is not needed.
> >>>> 
> >>>> A simple question: why you use it conditionally? Can the field contain invalid value?
> >>>> 
> >>> There are two checks
> >>> - chip->timeout_wait_stat >= TPM_TIMEOUT_WAIT_STAT
> >>> It could be invalid when future developer set it to some value less than `TPM_TIMEOUT_USECS_MIN`,
> >>> and crash the usleep 
> >> 
> >> I don't understand this. Why you don't set to appropriate value?
> Ok, fair enough, I assume developers will test it anyway to ensure no crash. Will remove this check.
> 
> > What you should do, is to define two fields:
> > 
> > - tpm_timeout_min
> > - tpm_timeout_max
> > 
> > And initialize these to TPM_TIMEOUT_USECS_MIN and TPM_TIMEOUT_USECS_MAX.
> > 
> > Then fixup those for Atmel (with a simple if-statement, switch-case is
> > overkill).
> Switch was more for extensibility when other vendor has similar issue,
> but we can refactor when needed in the future. I can use if-statement for now.

Make things more fancy *only* when you actually need more fancy.

> > The way you work out things right now is broken:
> > 
> > 1. Before for non-Atmel: usleep_range(100, 500)
> > 2. After for non-Atmel: usleep_range(200, 500)
> I realized this in day-1, I think this range change does not matter much.

By saying that you are actually saying that *undocumented* semantic changes
to the kernel code are fine as long as they don't change things "too much"

Are you serious about this?

> `TPM_TIMEOUT_RANGE_US=300` is already used in the codebase, I assume people define
> such if for general use cases for usleep_range in TPM
> But we can add two fields if that makes us more comfortable to strictly follow the current code
> semantically.

This has absolutely nothing to do with "comfortable". It's black and white
wrong.

/Jarkko

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH] tpm: fix ATMEL TPM crash caused by too frequent queries
  2021-07-05  5:19                   ` Jarkko Sakkinen
@ 2021-07-05  5:29                     ` Hao Wu
  0 siblings, 0 replies; 91+ messages in thread
From: Hao Wu @ 2021-07-05  5:29 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: Shrihari Kalkar, Seungyeop Han, Anish Jhaveri, peterhuewe, jgg,
	linux-integrity, Paul Menzel, Ken Goldman, zohar, why2jjj.linux,
	Hamza Attak, gregkh, arnd, Nayna, James Bottomley

> On Jul 4, 2021, at 10:19 PM, Jarkko Sakkinen <jarkko@kernel.org> wrote:
> 
> On Fri, Jul 02, 2021 at 12:16:12PM -0700, Hao Wu wrote:
>> 
>>> On Jul 2, 2021, at 4:57 AM, Jarkko Sakkinen <jarkko@kernel.org> wrote:
>>> 
>>> On Fri, Jul 02, 2021 at 11:42:39AM +0300, Jarkko Sakkinen wrote:
>>>> On Fri, Jul 02, 2021 at 12:59:18AM -0700, Hao Wu wrote:
>>>>>> On Jul 2, 2021, at 12:45 AM, Jarkko Sakkinen <jarkko@kernel.org> wrote:
>>>>>> 
>>>>>> On Fri, Jul 02, 2021 at 12:33:15AM -0700, Hao Wu wrote:
>>>>>>> 
>>>>>>> 
>>>>>>>> On Jul 1, 2021, at 11:35 PM, Jarkko Sakkinen <jarkko@kernel.org> wrote:
>>>>>>>> 
>>>>>>>> On Tue, Jun 29, 2021 at 09:22:05PM -0700, Hao Wu wrote:
>>>>>>>>> This is a fix for the ATMEL TPM crash bug reported in
>>>>>>>>> https://patchwork.kernel.org/project/linux-integrity/patch/20200926223150.109645-1-hao.wu@rubrik.com/
>>>>>>>>> 
>>>>>>>>> According to the discussions in the original thread,
>>>>>>>>> we don't want to revert the timeout of wait_for_tpm_stat
>>>>>>>>> for non-ATMEL chips, which brings back the performance cost.
>>>>>>>>> For investigation and analysis of why wait_for_tpm_stat
>>>>>>>>> caused the issue, and how the regression was introduced,
>>>>>>>>> please read the original thread above.
>>>>>>>>> 
>>>>>>>>> Thus the proposed fix here is to only revert the timeout
>>>>>>>>> for ATMEL chips by checking the vendor ID.
>>>>>>>>> 
>>>>>>>>> Signed-off-by: Hao Wu <hao.wu@rubrik.com>
>>>>>>>>> Fixes: 9f3fc7bcddcb ("tpm: replace msleep() with usleep_range() in TPM 1.2/2.0 generic drivers")
>>>>>>>> 
>>>>>>>> Fixes tag should be before SOB.
>>>>>>>> 
>>>>>>>>> ---
>>>>>>>>> Test Plan:
>>>>>>>>> - Run fixed kernel with ATMEL TPM chips and see crash
>>>>>>>>> has been fixed.
>>>>>>>>> - Run fixed kernel with non-ATMEL TPM chips, and confirm
>>>>>>>>> the timeout has not been changed.
>>>>>>>>> 
>>>>>>>>> drivers/char/tpm/tpm.h          |  9 ++++++++-
>>>>>>>>> drivers/char/tpm/tpm_tis_core.c | 19 +++++++++++++++++--
>>>>>>>>> include/linux/tpm.h             |  2 ++
>>>>>>>>> 3 files changed, 27 insertions(+), 3 deletions(-)
>>>>>>>>> 
>>>>>>>>> diff --git a/drivers/char/tpm/tpm.h b/drivers/char/tpm/tpm.h
>>>>>>>>> index 283f78211c3a..bc6aa7f9e119 100644
>>>>>>>>> --- a/drivers/char/tpm/tpm.h
>>>>>>>>> +++ b/drivers/char/tpm/tpm.h
>>>>>>>>> @@ -42,7 +42,9 @@ enum tpm_timeout {
>>>>>>>>> 	TPM_TIMEOUT_RANGE_US = 300,	/* usecs */
>>>>>>>>> 	TPM_TIMEOUT_POLL = 1,	/* msecs */
>>>>>>>>> 	TPM_TIMEOUT_USECS_MIN = 100,      /* usecs */
>>>>>>>>> -	TPM_TIMEOUT_USECS_MAX = 500      /* usecs */
>>>>>>>>> +	TPM_TIMEOUT_USECS_MAX = 500,	/* usecs */
>>>>>>>> 
>>>>>>>> What is this change?
>>>>>>> Need to add the tailing comma
>>>>>>> 
>>>>>>>> 
>>>>>>>>> +	TPM_TIMEOUT_WAIT_STAT = 500,	/* usecs */
>>>>>>>>> +	TPM_ATML_TIMEOUT_WAIT_STAT = 15000	/* usecs */
>>>>>>>>> };
>>>>>>>>> 
>>>>>>>>> /* TPM addresses */
>>>>>>>>> @@ -189,6 +191,11 @@ static inline void tpm_msleep(unsigned int delay_msec)
>>>>>>>>> 		     delay_msec * 1000);
>>>>>>>>> };
>>>>>>>>> 
>>>>>>>>> +static inline void tpm_usleep(unsigned int delay_usec)
>>>>>>>>> +{
>>>>>>>>> +	usleep_range(delay_usec - TPM_TIMEOUT_RANGE_US, delay_usec);
>>>>>>>>> +};
>>>>>>>> 
>>>>>>>> Please remove this, and open code.
>>>>>>> Ok, will do
>>>>>>> 
>>>>>>>>> +
>>>>>>>>> int tpm_chip_start(struct tpm_chip *chip);
>>>>>>>>> void tpm_chip_stop(struct tpm_chip *chip);
>>>>>>>>> struct tpm_chip *tpm_find_get_ops(struct tpm_chip *chip);
>>>>>>>>> diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
>>>>>>>>> index 55b9d3965ae1..9ddd4edfe1c2 100644
>>>>>>>>> --- a/drivers/char/tpm/tpm_tis_core.c
>>>>>>>>> +++ b/drivers/char/tpm/tpm_tis_core.c
>>>>>>>>> @@ -80,8 +80,12 @@ static int wait_for_tpm_stat(struct tpm_chip *chip, u8 mask,
>>>>>>>>> 		}
>>>>>>>>> 	} else {
>>>>>>>>> 		do {
>>>>>>>>> -			usleep_range(TPM_TIMEOUT_USECS_MIN,
>>>>>>>>> -				     TPM_TIMEOUT_USECS_MAX);
>>>>>>>>> +			if (chip->timeout_wait_stat && 
>>>>>>>>> +				chip->timeout_wait_stat >= TPM_TIMEOUT_WAIT_STAT) {
>>>>>>>>> +				tpm_usleep((unsigned int)(chip->timeout_wait_stat));
>>>>>>>>> +			} else {
>>>>>>>>> +				tpm_usleep((unsigned int)(TPM_TIMEOUT_WAIT_STAT));
>>>>>>>>> +			}
>>>>>>>> 
>>>>>>>> Invalid use of braces. Please read
>>>>>>>> 
>>>>>>>> https://www.kernel.org/doc/html/v5.13/process/coding-style.html
>>>>>>>> 
>>>>>>>> Why do you have to use this field conditionally anyway? Why doesn't
>>>>>>>> it always contain a legit value?
>>>>>>> The field is legit now, but doesn’t hurt to do addition check for robustness 
>>>>>>> to ensure no crash ? Just in case the value is updated below TPM_TIMEOUT_WAIT_STAT ? 
>>>>>>> 
>>>>>>> Can remove if we think it is not needed.
>>>>>> 
>>>>>> A simple question: why you use it conditionally? Can the field contain invalid value?
>>>>>> 
>>>>> There are two checks
>>>>> - chip->timeout_wait_stat >= TPM_TIMEOUT_WAIT_STAT
>>>>> It could be invalid when future developer set it to some value less than `TPM_TIMEOUT_USECS_MIN`,
>>>>> and crash the usleep 
>>>> 
>>>> I don't understand this. Why you don't set to appropriate value?
>> Ok, fair enough, I assume developers will test it anyway to ensure no crash. Will remove this check.
>> 
>>> What you should do, is to define two fields:
>>> 
>>> - tpm_timeout_min
>>> - tpm_timeout_max
>>> 
>>> And initialize these to TPM_TIMEOUT_USECS_MIN and TPM_TIMEOUT_USECS_MAX.
>>> 
>>> Then fixup those for Atmel (with a simple if-statement, switch-case is
>>> overkill).
>> Switch was more for extensibility when other vendor has similar issue,
>> but we can refactor when needed in the future. I can use if-statement for now.
> 
> Make things more fancy *only* when you actually need more fancy.
> 
>>> The way you work out things right now is broken:
>>> 
>>> 1. Before for non-Atmel: usleep_range(100, 500)
>>> 2. After for non-Atmel: usleep_range(200, 500)
>> I realized this in day-1, I think this range change does not matter much.
> 
> By saying that you are actually saying that *undocumented* semantic changes
> to the kernel code are fine as long as they don't change things "too much"
> 
> Are you serious about this?
Fair enough, I agree that keeping things as it avoid potential issues. Thanks for pointing this out!
> 
>> `TPM_TIMEOUT_RANGE_US=300` is already used in the codebase, I assume people define
>> such if for general use cases for usleep_range in TPM
>> But we can add two fields if that makes us more comfortable to strictly follow the current code
>> semantically.
> 
> This has absolutely nothing to do with "comfortable". It's black and white
> wrong.
> 
> /Jarkko

I believe the comments are addressed in 
https://patchwork.kernel.org/project/linux-integrity/patch/20210704000754.1384-1-hao.wu@rubrik.com/

Have tested it with ATMEL 1.2 chip. 

Thanks
Hao

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH] tpm: fix ATMEL TPM crash caused by too frequent queries
  2021-07-04  0:07     ` Hao Wu
@ 2021-07-05  7:15       ` Jarkko Sakkinen
  2021-07-05 23:09         ` Hao Wu
  0 siblings, 1 reply; 91+ messages in thread
From: Jarkko Sakkinen @ 2021-07-05  7:15 UTC (permalink / raw)
  To: Hao Wu
  Cc: shrihari.kalkar, seungyeop.han, anish.jhaveri, peterhuewe, jgg,
	linux-integrity, pmenzel, kgold, zohar, why2jjj.linux, hamza,
	gregkh, arnd, nayna, James.Bottomley

Is this really the first version? Please, use git-format-patch -vX.

On Sat, Jul 03, 2021 at 05:07:54PM -0700, Hao Wu wrote:
> This is a fix for the ATMEL TPM crash bug reported in
> https://patchwork.kernel.org/project/linux-integrity/patch/20200926223150.109645-1-hao.wu@rubrik.com/
> 
> According to the discussions in the original thread,
> we don't want to revert the timeout of wait_for_tpm_stat
> for non-ATMEL chips, which brings back the performance cost.
> For investigation and analysis of why wait_for_tpm_stat
> caused the issue, and how the regression was introduced,
> please read the original thread above.

Please, no xrefs. Instead, describe what you are doing.

> Thus the proposed fix here is to only revert the timeout
> for ATMEL chips by checking the vendor ID.

What do you mean by reverting?

The long description needs a full rewrite.

You can add

Link: https://patchwork.kernel.org/project/linux-integrity/patch/20200926223150.109645-1-hao.wu@rubrik.com/

But do not expect anyone to read the thread in order to
understand what the commit is doing.

> Fixes: 9f3fc7bcddcb ("tpm: replace msleep() with usleep_range() in TPM 1.2/2.0 generic drivers")
> Signed-off-by: Hao Wu <hao.wu@rubrik.com>
> ---
> Test Plan:
> - Run fixed kernel with ATMEL TPM chips and see crash
> has been fixed.
> - Run fixed kernel with non-ATMEL TPM chips, and confirm
> the timeout has not been changed.

The changelog is missing.

Please read section 14 of

https://www.kernel.org/doc/html/v4.17/process/submitting-patches.html

/Jarkko

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH] tpm: fix ATMEL TPM crash caused by too frequent queries
  2021-07-05  7:15       ` Jarkko Sakkinen
@ 2021-07-05 23:09         ` Hao Wu
  2021-07-06 12:34           ` Mimi Zohar
  0 siblings, 1 reply; 91+ messages in thread
From: Hao Wu @ 2021-07-05 23:09 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: Shrihari Kalkar, Seungyeop Han, Anish Jhaveri, peterhuewe, jgg,
	linux-integrity, Paul Menzel, Ken Goldman, zohar, why2jjj.linux,
	Hamza Attak, gregkh, arnd, Nayna, James.Bottomley

> On Jul 5, 2021, at 12:15 AM, Jarkko Sakkinen <jarkko@kernel.org> wrote:
> 
> Is this really the first version? Please, use git-format-patch -vX.
Got it, will re-send the patch with `[PATCH v2]`. 
Thanks for bearing my mistakes, I am not quite familiar with the workflow yet.

> On Sat, Jul 03, 2021 at 05:07:54PM -0700, Hao Wu wrote:
>> This is a fix for the ATMEL TPM crash bug reported in
>> https://patchwork.kernel.org/project/linux-integrity/patch/20200926223150.109645-1-hao.wu@rubrik.com/
>> 
>> According to the discussions in the original thread,
>> we don't want to revert the timeout of wait_for_tpm_stat
>> for non-ATMEL chips, which brings back the performance cost.
>> For investigation and analysis of why wait_for_tpm_stat
>> caused the issue, and how the regression was introduced,
>> please read the original thread above.
> 
> Please, no xrefs. Instead, describe what you are doing.
Ok, will rewrite the commit message
> 
>> Thus the proposed fix here is to only revert the timeout
>> for ATMEL chips by checking the vendor ID.
> 
> What do you mean by reverting?
> 
> The long description needs a full rewrite.
> 
> You can add
> 
> Link: https://patchwork.kernel.org/project/linux-integrity/patch/20200926223150.109645-1-hao.wu@rubrik.com/
> 
> But do not expect anyone to read the thread in order to
> understand what the commit is doing.
> 
>> Fixes: 9f3fc7bcddcb ("tpm: replace msleep() with usleep_range() in TPM 1.2/2.0 generic drivers")
>> Signed-off-by: Hao Wu <hao.wu@rubrik.com>
>> ---
>> Test Plan:
>> - Run fixed kernel with ATMEL TPM chips and see crash
>> has been fixed.
>> - Run fixed kernel with non-ATMEL TPM chips, and confirm
>> the timeout has not been changed.
> 
> The changelog is missing.
Sorry, I don’t get your point here. Could you help clarify it a bit.
I did follow the section 14, but I didn’t see anything specifically needed after `---`
Could you be specific ? Thanks for your time. 

> Please read section 14 of
> 
> https://www.kernel.org/doc/html/v4.17/process/submitting-patches.html
> 
> /Jarkko
Hao

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH] tpm: fix ATMEL TPM crash caused by too frequent queries
  2021-07-05 23:09         ` Hao Wu
@ 2021-07-06 12:34           ` Mimi Zohar
  2021-07-07  4:18             ` Hao Wu
  0 siblings, 1 reply; 91+ messages in thread
From: Mimi Zohar @ 2021-07-06 12:34 UTC (permalink / raw)
  To: Hao Wu, Jarkko Sakkinen
  Cc: Shrihari Kalkar, Seungyeop Han, Anish Jhaveri, peterhuewe, jgg,
	linux-integrity, Paul Menzel, Ken Goldman, zohar, why2jjj.linux,
	Hamza Attak, gregkh, arnd, Nayna, James.Bottomley

On Mon, 2021-07-05 at 16:09 -0700, Hao Wu wrote:
> > On Jul 5, 2021, at 12:15 AM, Jarkko Sakkinen <jarkko@kernel.org> wrote:
> > 
> 
> >> Signed-off-by: Hao Wu <hao.wu@rubrik.com>
> >> ---
> >> Test Plan:
> >> - Run fixed kernel with ATMEL TPM chips and see crash
> >> has been fixed.
> >> - Run fixed kernel with non-ATMEL TPM chips, and confirm
> >> the timeout has not been changed.
> > 
> > The changelog is missing.
> Sorry, I don’t get your point here. Could you help clarify it a bit.
> I did follow the section 14, but I didn’t see anything specifically needed after `---`
> Could you be specific ? Thanks for your time. 

The changes from one version of a patch, or patch set, to the next has
moved around a bit.  Some people put it in the cover letter, others put
it on the individual patches.   It's also moved from within the patch
description to after the dashes.  Documentation/process/submitting-
patches.rst provides an example, but leaves out the word "Changelog".  
For an explanation of what goes into the patch description versus the
Changelog, search for "Other comments relevant only to the moment or
the maintainer".

For example, this version of the patch limits increasing the delay just
for Atmel TPM 1.2 chips.  At minimum it would be included in the
Changelog, but more likely included in the patch description itself and
perhaps even in the Subject line.

thanks,

Mimi

> 
> > Please read section 14 of
> > 
> > https://www.kernel.org/doc/html/v4.17/process/submitting-patches.html
> > 
> > /Jarkko
> Hao



^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH] tpm: fix ATMEL TPM crash caused by too frequent queries
  2021-07-06 12:34           ` Mimi Zohar
@ 2021-07-07  4:18             ` Hao Wu
  2021-07-07  4:34               ` Hao Wu
  0 siblings, 1 reply; 91+ messages in thread
From: Hao Wu @ 2021-07-07  4:18 UTC (permalink / raw)
  To: Mimi Zohar
  Cc: Jarkko Sakkinen, Shrihari Kalkar, Seungyeop Han, Anish Jhaveri,
	peterhuewe, jgg, linux-integrity, Paul Menzel, Ken Goldman,
	zohar, why2jjj.linux, Hamza Attak, gregkh, arnd, Nayna,
	James.Bottomley

> On Jul 6, 2021, at 5:34 AM, Mimi Zohar <zohar@linux.ibm.com> wrote:
> 
> On Mon, 2021-07-05 at 16:09 -0700, Hao Wu wrote:
>>> On Jul 5, 2021, at 12:15 AM, Jarkko Sakkinen <jarkko@kernel.org> wrote:
>>> 
>> 
>>>> Signed-off-by: Hao Wu <hao.wu@rubrik.com>
>>>> ---
>>>> Test Plan:
>>>> - Run fixed kernel with ATMEL TPM chips and see crash
>>>> has been fixed.
>>>> - Run fixed kernel with non-ATMEL TPM chips, and confirm
>>>> the timeout has not been changed.
>>> 
>>> The changelog is missing.
>> Sorry, I don’t get your point here. Could you help clarify it a bit.
>> I did follow the section 14, but I didn’t see anything specifically needed after `---`
>> Could you be specific ? Thanks for your time. 
> 
> The changes from one version of a patch, or patch set, to the next has
> moved around a bit.  Some people put it in the cover letter, others put
> it on the individual patches.   It's also moved from within the patch
> description to after the dashes.  Documentation/process/submitting-
> patches.rst provides an example, but leaves out the word "Changelog".  
> For an explanation of what goes into the patch description versus the
> Changelog, search for "Other comments relevant only to the moment or
> the maintainer".
I see. That makes sense to me now. The term “changelog” appears multiple times
In the doc for different purpose, thus it was confusing to me. Here we are
referring to "patch changelog”. Will add it after `—`. Thanks for the clarification!

> 
> For example, this version of the patch limits increasing the delay just
> for Atmel TPM 1.2 chips.  At minimum it would be included in the
> Changelog, but more likely included in the patch description itself and
> perhaps even in the Subject line.
> 
> thanks,
> 
> Mimi
> 
>> 
>>> Please read section 14 of
>>> 
>>> https://www.kernel.org/doc/html/v4.17/process/submitting-patches.html
>>> 
>>> /Jarkko
>> Hao
> 
> 
Hao

^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v2] tpm: fix ATMEL TPM crash caused by too frequent queries
  2021-06-30  4:22   ` [PATCH] tpm: fix ATMEL " Hao Wu
  2021-07-02  6:35     ` Jarkko Sakkinen
  2021-07-04  0:07     ` Hao Wu
@ 2021-07-07  4:31     ` Hao Wu
  2021-07-07  9:24       ` Jarkko Sakkinen
  2021-07-09  4:40     ` [PATCH v2] tpm: fix Atmel " Hao Wu
  3 siblings, 1 reply; 91+ messages in thread
From: Hao Wu @ 2021-07-07  4:31 UTC (permalink / raw)
  To: hao.wu, shrihari.kalkar, seungyeop.han, anish.jhaveri,
	peterhuewe, jarkko, jgg, linux-integrity, pmenzel, kgold, zohar,
	why2jjj.linux, hamza, gregkh, arnd, nayna, James.Bottomley

Since kernel 4.14, there was a commit (9f3fc7bcddcb)
fixed the TPM sleep logic from msleep to usleep_range,
so that the TPM sleeps exactly with TPM_TIMEOUT (=5ms) afterward.
Before that fix, msleep(5) actually sleeps for around 15ms.

This timeout change caused the ATMEL 1.2 TPM chip crash,
and the patch here is to fix it in the master branch.
Crash signature is as follows:
```
$ tpm_sealdata -z
Tspi_Key_LoadKey failed: 0x00001087 - layer=tddl, code=0087 (135),
I/O error

$ sudo dmesg | grep tpm0
[59154.665549] tpm tpm0: tpm_try_transmit: send(): error -62
```

With at few more changes after 4.14, the timeout was
reduced to less than 1 ms today in the master branch.
- in 4.16 commit cf151a9a44d5 uses `TPM_POLL_SLEEP` instead of
  TPM_TIMEOUT for `wait_for_tpm_stat` and set `TPM_POLL_SLEEP` (1ms).
- in 4.18 commits 59f5a6b07f64 and 424eaf910c32 further
  reduced the timeout in wait_for_tpm_stat to less than 1ms.

This patch is to fix the TPM crash for ATMEL 1.2 TPM chip.
We specifically use 15ms timeout for the ATMEL 1.2 TPM chip
in wait_for_tpm_stat, but keep the timeout for other chips
unchanged. The 15ms timeout was the timeout
works for ATMEL 1.2 TPM chip before 4.14.

Fixes: 9f3fc7bcddcb ("tpm: replace msleep() with usleep_range() in TPM 1.2/2.0 generic drivers")
Link: https://patchwork.kernel.org/project/linux-integrity/patch/20200926223150.109645-1-hao.wu@rubrik.com/
Signed-off-by: Hao Wu <hao.wu@rubrik.com>
---
This version (v2) has following changes on top of the last (v1):
- follow the existing way to define two timeouts (min and max)
  for ATMEL chip, thus keep the exact timeout logic for 
  non-ATEML chips.
- limit the timeout increase to only ATMEL TPM 1.2 chips,
  because it is not an issue for TPM 2.0 chips yet.

Test Plan:
- Run fixed kernel with ATMEL TPM chips and see crash
has been fixed.
- Run fixed kernel with non-ATMEL TPM chips, and confirm
the timeout has not been changed.

 drivers/char/tpm/tpm.h          |  6 ++++--
 drivers/char/tpm/tpm_tis_core.c | 23 +++++++++++++++++++++--
 include/linux/tpm.h             |  3 +++
 3 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/drivers/char/tpm/tpm.h b/drivers/char/tpm/tpm.h
index 283f78211c3a..6de1b44c4aab 100644
--- a/drivers/char/tpm/tpm.h
+++ b/drivers/char/tpm/tpm.h
@@ -41,8 +41,10 @@ enum tpm_timeout {
 	TPM_TIMEOUT_RETRY = 100, /* msecs */
 	TPM_TIMEOUT_RANGE_US = 300,	/* usecs */
 	TPM_TIMEOUT_POLL = 1,	/* msecs */
-	TPM_TIMEOUT_USECS_MIN = 100,      /* usecs */
-	TPM_TIMEOUT_USECS_MAX = 500      /* usecs */
+	TPM_TIMEOUT_USECS_MIN = 100,	/* usecs */
+	TPM_TIMEOUT_USECS_MAX = 500,	/* usecs */
+	TPM_ATML_TIMEOUT_WAIT_STAT_MIN = 14700,	/* usecs */
+	TPM_ATML_TIMEOUT_WAIT_STAT_MAX = 15000	/* usecs */
 };
 
 /* TPM addresses */
diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
index 55b9d3965ae1..ae27d66fdd94 100644
--- a/drivers/char/tpm/tpm_tis_core.c
+++ b/drivers/char/tpm/tpm_tis_core.c
@@ -80,8 +80,17 @@ static int wait_for_tpm_stat(struct tpm_chip *chip, u8 mask,
 		}
 	} else {
 		do {
-			usleep_range(TPM_TIMEOUT_USECS_MIN,
-				     TPM_TIMEOUT_USECS_MAX);
+			/* this code path could be executed before
+			 * timeouts initialized in chip instance.
+			 */
+			if (chip->timeout_wait_stat_min &&
+			    chip->timeout_wait_stat_max)
+				usleep_range(chip->timeout_wait_stat_min,
+					     chip->timeout_wait_stat_max);
+			else
+				usleep_range(TPM_TIMEOUT_USECS_MIN,
+					     TPM_TIMEOUT_USECS_MAX);
+
 			status = chip->ops->status(chip);
 			if ((status & mask) == mask)
 				return 0;
@@ -934,6 +943,9 @@ int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq,
 	chip->timeout_b = msecs_to_jiffies(TIS_TIMEOUT_B_MAX);
 	chip->timeout_c = msecs_to_jiffies(TIS_TIMEOUT_C_MAX);
 	chip->timeout_d = msecs_to_jiffies(TIS_TIMEOUT_D_MAX);
+	/* init timeouts for wait_for_tpm_stat */
+	chip->timeout_wait_stat_min = TPM_TIMEOUT_USECS_MIN;
+	chip->timeout_wait_stat_max = TPM_TIMEOUT_USECS_MAX;
 	priv->phy_ops = phy_ops;
 	dev_set_drvdata(&chip->dev, priv);
 
@@ -983,6 +995,13 @@ int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq,
 
 	priv->manufacturer_id = vendor;
 
+	if (priv->manufacturer_id == TPM_VID_ATML &&
+		!(chip->flags & TPM_CHIP_FLAG_TPM2)) {
+		/* If TPM chip is 1.2 ATMEL chip, timeout need to be relaxed*/
+		chip->timeout_wait_stat_min = TPM_ATML_TIMEOUT_WAIT_STAT_MIN;
+		chip->timeout_wait_stat_max = TPM_ATML_TIMEOUT_WAIT_STAT_MAX;
+	}
+
 	rc = tpm_tis_read8(priv, TPM_RID(0), &rid);
 	if (rc < 0)
 		goto out_err;
diff --git a/include/linux/tpm.h b/include/linux/tpm.h
index aa11fe323c56..171b9102c976 100644
--- a/include/linux/tpm.h
+++ b/include/linux/tpm.h
@@ -150,6 +150,8 @@ struct tpm_chip {
 	bool timeout_adjusted;
 	unsigned long duration[TPM_NUM_DURATIONS]; /* jiffies */
 	bool duration_adjusted;
+	unsigned int timeout_wait_stat_min; /* usecs */
+	unsigned int timeout_wait_stat_max; /* usecs */
 
 	struct dentry *bios_dir[TPM_NUM_EVENT_LOG_FILES];
 
@@ -269,6 +271,7 @@ enum tpm2_cc_attrs {
 #define TPM_VID_INTEL    0x8086
 #define TPM_VID_WINBOND  0x1050
 #define TPM_VID_STM      0x104A
+#define TPM_VID_ATML     0x1114
 
 enum tpm_chip_flags {
 	TPM_CHIP_FLAG_TPM2		= BIT(1),
-- 
2.29.0.vfs.0.0


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH] tpm: fix ATMEL TPM crash caused by too frequent queries
  2021-07-07  4:18             ` Hao Wu
@ 2021-07-07  4:34               ` Hao Wu
  0 siblings, 0 replies; 91+ messages in thread
From: Hao Wu @ 2021-07-07  4:34 UTC (permalink / raw)
  To: Mimi Zohar, Jarkko Sakkinen
  Cc: Shrihari Kalkar, Seungyeop Han, Anish Jhaveri, peterhuewe, jgg,
	linux-integrity, Paul Menzel, Ken Goldman, zohar, why2jjj.linux,
	Hamza Attak, gregkh, arnd, Nayna, James Bottomley

> On Jul 6, 2021, at 9:18 PM, Hao Wu <hao.wu@rubrik.com> wrote:
> 
>> On Jul 6, 2021, at 5:34 AM, Mimi Zohar <zohar@linux.ibm.com> wrote:
>> 
>> On Mon, 2021-07-05 at 16:09 -0700, Hao Wu wrote:
>>>> On Jul 5, 2021, at 12:15 AM, Jarkko Sakkinen <jarkko@kernel.org> wrote:
>>>> 
>>> 
>>>>> Signed-off-by: Hao Wu <hao.wu@rubrik.com>
>>>>> ---
>>>>> Test Plan:
>>>>> - Run fixed kernel with ATMEL TPM chips and see crash
>>>>> has been fixed.
>>>>> - Run fixed kernel with non-ATMEL TPM chips, and confirm
>>>>> the timeout has not been changed.
>>>> 
>>>> The changelog is missing.
>>> Sorry, I don’t get your point here. Could you help clarify it a bit.
>>> I did follow the section 14, but I didn’t see anything specifically needed after `---`
>>> Could you be specific ? Thanks for your time. 
>> 
>> The changes from one version of a patch, or patch set, to the next has
>> moved around a bit.  Some people put it in the cover letter, others put
>> it on the individual patches.   It's also moved from within the patch
>> description to after the dashes.  Documentation/process/submitting-
>> patches.rst provides an example, but leaves out the word "Changelog".  
>> For an explanation of what goes into the patch description versus the
>> Changelog, search for "Other comments relevant only to the moment or
>> the maintainer".
> I see. That makes sense to me now. The term “changelog” appears multiple times
> In the doc for different purpose, thus it was confusing to me. Here we are
> referring to "patch changelog”. Will add it after `—`. Thanks for the clarification!
> 
>> 
>> For example, this version of the patch limits increasing the delay just
>> for Atmel TPM 1.2 chips.  At minimum it would be included in the
>> Changelog, but more likely included in the patch description itself and
>> perhaps even in the Subject line.
>> 
>> thanks,
>> 
>> Mimi
>> 
>>> 
>>>> Please read section 14 of
>>>> 
>>>> https://www.kernel.org/doc/html/v4.17/process/submitting-patches.html
>>>> 
>>>> /Jarkko
>>> Hao
>> 
>> 
> Hao

Updated in https://patchwork.kernel.org/project/linux-integrity/patch/20210707043135.33434-1-hao.wu@rubrik.com/
Hopefully this time it is qualified.

Hao

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v2] tpm: fix ATMEL TPM crash caused by too frequent queries
  2021-07-07  4:31     ` [PATCH v2] " Hao Wu
@ 2021-07-07  9:24       ` Jarkko Sakkinen
  2021-07-07 18:28         ` Hao Wu
  0 siblings, 1 reply; 91+ messages in thread
From: Jarkko Sakkinen @ 2021-07-07  9:24 UTC (permalink / raw)
  To: Hao Wu
  Cc: shrihari.kalkar, seungyeop.han, anish.jhaveri, peterhuewe, jgg,
	linux-integrity, pmenzel, kgold, zohar, why2jjj.linux, hamza,
	gregkh, arnd, nayna, James.Bottomley

On Tue, Jul 06, 2021 at 09:31:35PM -0700, Hao Wu wrote:
> Since kernel 4.14, there was a commit (9f3fc7bcddcb)
> fixed the TPM sleep logic from msleep to usleep_range,
> so that the TPM sleeps exactly with TPM_TIMEOUT (=5ms) afterward.
> Before that fix, msleep(5) actually sleeps for around 15ms.

What is TPM sleep logic?

/Jarkko

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v2] tpm: fix ATMEL TPM crash caused by too frequent queries
  2021-07-07  9:24       ` Jarkko Sakkinen
@ 2021-07-07 18:28         ` Hao Wu
  2021-07-07 21:10           ` Jarkko Sakkinen
  0 siblings, 1 reply; 91+ messages in thread
From: Hao Wu @ 2021-07-07 18:28 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: Shrihari Kalkar, Seungyeop Han, Anish Jhaveri, peterhuewe, jgg,
	linux-integrity, Paul Menzel, Ken Goldman, zohar, why2jjj.linux,
	Hamza Attak, gregkh, arnd, Nayna, James.Bottomley

> On Jul 7, 2021, at 2:24 AM, Jarkko Sakkinen <jarkko@kernel.org> wrote:
> 
> On Tue, Jul 06, 2021 at 09:31:35PM -0700, Hao Wu wrote:
>> Since kernel 4.14, there was a commit (9f3fc7bcddcb)
>> fixed the TPM sleep logic from msleep to usleep_range,
>> so that the TPM sleeps exactly with TPM_TIMEOUT (=5ms) afterward.
>> Before that fix, msleep(5) actually sleeps for around 15ms.
> 
> What is TPM sleep logic?
It is about the commit metnioned in the description
`tpm: replace msleep() with usleep_range() in TPM 1.2/2.0 generic drivers`
https://github.com/torvalds/linux/commit/9f3fc7bcddcb51234e23494531f93ab60475e1c3

Any better description or terms ?

> 
> /Jarkko

Hao

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v2] tpm: fix ATMEL TPM crash caused by too frequent queries
  2021-07-07 18:28         ` Hao Wu
@ 2021-07-07 21:10           ` Jarkko Sakkinen
  2021-07-09  4:43             ` Hao Wu
  0 siblings, 1 reply; 91+ messages in thread
From: Jarkko Sakkinen @ 2021-07-07 21:10 UTC (permalink / raw)
  To: Hao Wu
  Cc: Shrihari Kalkar, Seungyeop Han, Anish Jhaveri, peterhuewe, jgg,
	linux-integrity, Paul Menzel, Ken Goldman, zohar, why2jjj.linux,
	Hamza Attak, gregkh, arnd, Nayna, James.Bottomley

On Wed, Jul 07, 2021 at 11:28:35AM -0700, Hao Wu wrote:
> > On Jul 7, 2021, at 2:24 AM, Jarkko Sakkinen <jarkko@kernel.org> wrote:
> > 
> > On Tue, Jul 06, 2021 at 09:31:35PM -0700, Hao Wu wrote:
> >> Since kernel 4.14, there was a commit (9f3fc7bcddcb)

BTW, please remove this. You have a fixes tag.

> >> fixed the TPM sleep logic from msleep to usleep_range,
> >> so that the TPM sleeps exactly with TPM_TIMEOUT (=5ms) afterward.
> >> Before that fix, msleep(5) actually sleeps for around 15ms.
> > 
> > What is TPM sleep logic?
> It is about the commit metnioned in the description
> `tpm: replace msleep() with usleep_range() in TPM 1.2/2.0 generic drivers`
> https://github.com/torvalds/linux/commit/9f3fc7bcddcb51234e23494531f93ab60475e1c3

What you should do is to explain in simple terms the unwanted behaviour
that you are observing, and also, *when* you observe it. E.g. does this
happen when you use /dev/tpm0, or is it visible already in klog at boot
time. And also: does it occur for anything you put to /dev/tpm0, or is
the bug triggering for some particular TPM commands.

You also need to have information what kind of Atmel chip triggers the
bug. I'd presume that you have access to a machine with such chip?

When you get all that figured out, you should explain how you change
the existing behaviour, and why it makes sense. E.g. if you fixup
timeouts, please just tell how'd you end up choosing the values that
you picked.

E.g. the rationale for that could come from testing and finding the "sweet
spot", or perhaps the reason could be that old values worked, new ones
don't.

Especially in bug fixes the reasoning is *at least* as important as the
the code change itself because I need to know what is going on.

/Jarkko

^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v2] tpm: fix Atmel TPM crash caused by too frequent queries
  2021-06-30  4:22   ` [PATCH] tpm: fix ATMEL " Hao Wu
                       ` (2 preceding siblings ...)
  2021-07-07  4:31     ` [PATCH v2] " Hao Wu
@ 2021-07-09  4:40     ` Hao Wu
  2021-07-09 17:47       ` Jarkko Sakkinen
  2021-07-11  7:51       ` [PATCH v3] " Hao Wu
  3 siblings, 2 replies; 91+ messages in thread
From: Hao Wu @ 2021-07-09  4:40 UTC (permalink / raw)
  To: hao.wu, shrihari.kalkar, seungyeop.han, anish.jhaveri,
	peterhuewe, jarkko, jgg, linux-integrity, pmenzel, kgold, zohar,
	why2jjj.linux, hamza, gregkh, arnd, nayna, James.Bottomley

The Atmel TPM 1.2 chips crash with error
`tpm_try_transmit: send(): error -62` since kernel 4.14.
It is observed from the kernel log after running `tpm_sealdata -z`.
The error thrown from the command is as follows
```
$ tpm_sealdata -z
Tspi_Key_LoadKey failed: 0x00001087 - layer=tddl,
code=0087 (135), I/O error
```

The issue was reproduced with the following Atmel TPM chip:
```
$ tpm_version
T0  TPM 1.2 Version Info:
  Chip Version:        1.2.66.1
  Spec Level:          2
  Errata Revision:     3
  TPM Vendor ID:       ATML
  TPM Version:         01010000
  Manufacturer Info:   41544d4c
```

The root cause of the issue is due to the TPM calls to msleep()
were replaced with usleep_range() [1], which reduces
the actual timeout. Via experiments, it is observed that
the original msleep(5) actually sleeps for 15ms.
Because of a known timeout issue in Atmel TPM 1.2 chip,
the shorter timeout than 15ms can cause the error described above.

A few further changes in kernel 4.16 [2] and 4.18 [3, 4] further
reduced the timeout to less than 1ms. With experiments,
the problematic timeout in the latest kernel is the one
for `wait_for_tpm_stat`.

To fix it, the patch reverts the timeout of `wait_for_tpm_stat`
to 15ms for all Atmel TPM 1.2 chips, but leave it untouched
for Ateml TPM 2.0 chip, and chips from other vendors.
As explained above, the chosen 15ms timeout is
the actual timeout before this issue introduced,
thus the old value is used here.
Particularly, TPM_ATML_TIMEOUT_WAIT_STAT_MIN is set to 14700us,
TPM_ATML_TIMEOUT_WAIT_STAT_MIN is set to 15000us according to
the existing TPM_TIMEOUT_RANGE_US (300us).
The fixed has been tested in the system with the affected Atmel chip
with no issues observed after boot up.

References:
[1] 9f3fc7bcddcb tpm: replace msleep() with usleep_range() in TPM
1.2/2.0 generic drivers
[2] cf151a9a44d5 tpm: reduce tpm polling delay in tpm_tis_core
[3] 59f5a6b07f64 tpm: reduce poll sleep time in tpm_transmit()
[4] 424eaf910c32 tpm: reduce polling time to usecs for even finer
granularity

Fixes: 9f3fc7bcddcb ("tpm: replace msleep() with usleep_range() in TPM 1.2/2.0 generic drivers")
Link: https://patchwork.kernel.org/project/linux-integrity/patch/20200926223150.109645-1-hao.wu@rubrik.com/
Signed-off-by: Hao Wu <hao.wu@rubrik.com>
---
This version (v2) has following changes on top of the last (v1):
- follow the existing way to define two timeouts (min and max)
  for ATMEL chip, thus keep the exact timeout logic for 
  non-ATEML chips.
- limit the timeout increase to only ATMEL TPM 1.2 chips,
  because it is not an issue for TPM 2.0 chips yet.

Test Plan:
- Run fixed kernel with ATMEL TPM chips and see crash has been fixed.
- Run fixed kernel with non-ATMEL TPM chips, and confirm
  the timeout has not been changed.

 drivers/char/tpm/tpm.h          |  6 ++++--
 drivers/char/tpm/tpm_tis_core.c | 23 +++++++++++++++++++++--
 include/linux/tpm.h             |  3 +++
 3 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/drivers/char/tpm/tpm.h b/drivers/char/tpm/tpm.h
index 283f78211c3a..6de1b44c4aab 100644
--- a/drivers/char/tpm/tpm.h
+++ b/drivers/char/tpm/tpm.h
@@ -41,8 +41,10 @@ enum tpm_timeout {
 	TPM_TIMEOUT_RETRY = 100, /* msecs */
 	TPM_TIMEOUT_RANGE_US = 300,	/* usecs */
 	TPM_TIMEOUT_POLL = 1,	/* msecs */
-	TPM_TIMEOUT_USECS_MIN = 100,      /* usecs */
-	TPM_TIMEOUT_USECS_MAX = 500      /* usecs */
+	TPM_TIMEOUT_USECS_MIN = 100,	/* usecs */
+	TPM_TIMEOUT_USECS_MAX = 500,	/* usecs */
+	TPM_ATML_TIMEOUT_WAIT_STAT_MIN = 14700,	/* usecs */
+	TPM_ATML_TIMEOUT_WAIT_STAT_MAX = 15000	/* usecs */
 };
 
 /* TPM addresses */
diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
index 55b9d3965ae1..ae27d66fdd94 100644
--- a/drivers/char/tpm/tpm_tis_core.c
+++ b/drivers/char/tpm/tpm_tis_core.c
@@ -80,8 +80,17 @@ static int wait_for_tpm_stat(struct tpm_chip *chip, u8 mask,
 		}
 	} else {
 		do {
-			usleep_range(TPM_TIMEOUT_USECS_MIN,
-				     TPM_TIMEOUT_USECS_MAX);
+			/* this code path could be executed before
+			 * timeouts initialized in chip instance.
+			 */
+			if (chip->timeout_wait_stat_min &&
+			    chip->timeout_wait_stat_max)
+				usleep_range(chip->timeout_wait_stat_min,
+					     chip->timeout_wait_stat_max);
+			else
+				usleep_range(TPM_TIMEOUT_USECS_MIN,
+					     TPM_TIMEOUT_USECS_MAX);
+
 			status = chip->ops->status(chip);
 			if ((status & mask) == mask)
 				return 0;
@@ -934,6 +943,9 @@ int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq,
 	chip->timeout_b = msecs_to_jiffies(TIS_TIMEOUT_B_MAX);
 	chip->timeout_c = msecs_to_jiffies(TIS_TIMEOUT_C_MAX);
 	chip->timeout_d = msecs_to_jiffies(TIS_TIMEOUT_D_MAX);
+	/* init timeouts for wait_for_tpm_stat */
+	chip->timeout_wait_stat_min = TPM_TIMEOUT_USECS_MIN;
+	chip->timeout_wait_stat_max = TPM_TIMEOUT_USECS_MAX;
 	priv->phy_ops = phy_ops;
 	dev_set_drvdata(&chip->dev, priv);
 
@@ -983,6 +995,13 @@ int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq,
 
 	priv->manufacturer_id = vendor;
 
+	if (priv->manufacturer_id == TPM_VID_ATML &&
+		!(chip->flags & TPM_CHIP_FLAG_TPM2)) {
+		/* If TPM chip is 1.2 ATMEL chip, timeout need to be relaxed*/
+		chip->timeout_wait_stat_min = TPM_ATML_TIMEOUT_WAIT_STAT_MIN;
+		chip->timeout_wait_stat_max = TPM_ATML_TIMEOUT_WAIT_STAT_MAX;
+	}
+
 	rc = tpm_tis_read8(priv, TPM_RID(0), &rid);
 	if (rc < 0)
 		goto out_err;
diff --git a/include/linux/tpm.h b/include/linux/tpm.h
index aa11fe323c56..171b9102c976 100644
--- a/include/linux/tpm.h
+++ b/include/linux/tpm.h
@@ -150,6 +150,8 @@ struct tpm_chip {
 	bool timeout_adjusted;
 	unsigned long duration[TPM_NUM_DURATIONS]; /* jiffies */
 	bool duration_adjusted;
+	unsigned int timeout_wait_stat_min; /* usecs */
+	unsigned int timeout_wait_stat_max; /* usecs */
 
 	struct dentry *bios_dir[TPM_NUM_EVENT_LOG_FILES];
 
@@ -269,6 +271,7 @@ enum tpm2_cc_attrs {
 #define TPM_VID_INTEL    0x8086
 #define TPM_VID_WINBOND  0x1050
 #define TPM_VID_STM      0x104A
+#define TPM_VID_ATML     0x1114
 
 enum tpm_chip_flags {
 	TPM_CHIP_FLAG_TPM2		= BIT(1),
-- 
2.29.0.vfs.0.0


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v2] tpm: fix ATMEL TPM crash caused by too frequent queries
  2021-07-07 21:10           ` Jarkko Sakkinen
@ 2021-07-09  4:43             ` Hao Wu
  0 siblings, 0 replies; 91+ messages in thread
From: Hao Wu @ 2021-07-09  4:43 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: Shrihari Kalkar, Seungyeop Han, Anish Jhaveri, peterhuewe, jgg,
	linux-integrity, Paul Menzel, Ken Goldman, zohar, why2jjj.linux,
	Hamza Attak, gregkh, arnd, Nayna, James.Bottomley

> On Jul 7, 2021, at 2:10 PM, Jarkko Sakkinen <jarkko@kernel.org> wrote:
> 
> On Wed, Jul 07, 2021 at 11:28:35AM -0700, Hao Wu wrote:
>>> On Jul 7, 2021, at 2:24 AM, Jarkko Sakkinen <jarkko@kernel.org> wrote:
>>> 
>>> On Tue, Jul 06, 2021 at 09:31:35PM -0700, Hao Wu wrote:
>>>> Since kernel 4.14, there was a commit (9f3fc7bcddcb)
> 
> BTW, please remove this. You have a fixes tag.
> 
>>>> fixed the TPM sleep logic from msleep to usleep_range,
>>>> so that the TPM sleeps exactly with TPM_TIMEOUT (=5ms) afterward.
>>>> Before that fix, msleep(5) actually sleeps for around 15ms.
>>> 
>>> What is TPM sleep logic?
>> It is about the commit metnioned in the description
>> `tpm: replace msleep() with usleep_range() in TPM 1.2/2.0 generic drivers`
>> https://github.com/torvalds/linux/commit/9f3fc7bcddcb51234e23494531f93ab60475e1c3
> 
> What you should do is to explain in simple terms the unwanted behaviour
> that you are observing, and also, *when* you observe it. E.g. does this
> happen when you use /dev/tpm0, or is it visible already in klog at boot
> time. And also: does it occur for anything you put to /dev/tpm0, or is
> the bug triggering for some particular TPM commands.
> 
> You also need to have information what kind of Atmel chip triggers the
> bug. I'd presume that you have access to a machine with such chip?
> 
> When you get all that figured out, you should explain how you change
> the existing behaviour, and why it makes sense. E.g. if you fixup
> timeouts, please just tell how'd you end up choosing the values that
> you picked.
> 
> E.g. the rationale for that could come from testing and finding the "sweet
> spot", or perhaps the reason could be that old values worked, new ones
> don't.
> 
> Especially in bug fixes the reasoning is *at least* as important as the
> the code change itself because I need to know what is going on.
> 
> /Jarkko

Thanks Jarkko for pointing the direction! I have updated the description
and sent a new revision.

Hao


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v2] tpm: fix Atmel TPM crash caused by too frequent queries
  2021-07-09  4:40     ` [PATCH v2] tpm: fix Atmel " Hao Wu
@ 2021-07-09 17:47       ` Jarkko Sakkinen
  2021-07-09 19:23         ` Hao Wu
  2021-07-11  7:51       ` [PATCH v3] " Hao Wu
  1 sibling, 1 reply; 91+ messages in thread
From: Jarkko Sakkinen @ 2021-07-09 17:47 UTC (permalink / raw)
  To: Hao Wu
  Cc: shrihari.kalkar, seungyeop.han, anish.jhaveri, peterhuewe, jgg,
	linux-integrity, pmenzel, kgold, zohar, why2jjj.linux, hamza,
	gregkh, arnd, nayna, James.Bottomley

On Thu, Jul 08, 2021 at 09:40:28PM -0700, Hao Wu wrote:
> The Atmel TPM 1.2 chips crash with error
> `tpm_try_transmit: send(): error -62` since kernel 4.14.
> It is observed from the kernel log after running `tpm_sealdata -z`.
> The error thrown from the command is as follows
> ```
> $ tpm_sealdata -z
> Tspi_Key_LoadKey failed: 0x00001087 - layer=tddl,
> code=0087 (135), I/O error
> ```
> 
> The issue was reproduced with the following Atmel TPM chip:
> ```
> $ tpm_version
> T0  TPM 1.2 Version Info:
>   Chip Version:        1.2.66.1
>   Spec Level:          2
>   Errata Revision:     3
>   TPM Vendor ID:       ATML
>   TPM Version:         01010000
>   Manufacturer Info:   41544d4c
> ```
> 
> The root cause of the issue is due to the TPM calls to msleep()
> were replaced with usleep_range() [1], which reduces
> the actual timeout. Via experiments, it is observed that
> the original msleep(5) actually sleeps for 15ms.
> Because of a known timeout issue in Atmel TPM 1.2 chip,
> the shorter timeout than 15ms can cause the error described above.
> 
> A few further changes in kernel 4.16 [2] and 4.18 [3, 4] further
> reduced the timeout to less than 1ms. With experiments,
> the problematic timeout in the latest kernel is the one
> for `wait_for_tpm_stat`.
> 
> To fix it, the patch reverts the timeout of `wait_for_tpm_stat`
> to 15ms for all Atmel TPM 1.2 chips, but leave it untouched
> for Ateml TPM 2.0 chip, and chips from other vendors.
> As explained above, the chosen 15ms timeout is
> the actual timeout before this issue introduced,
> thus the old value is used here.
> Particularly, TPM_ATML_TIMEOUT_WAIT_STAT_MIN is set to 14700us,
> TPM_ATML_TIMEOUT_WAIT_STAT_MIN is set to 15000us according to
> the existing TPM_TIMEOUT_RANGE_US (300us).
> The fixed has been tested in the system with the affected Atmel chip
> with no issues observed after boot up.
> 
> References:
> [1] 9f3fc7bcddcb tpm: replace msleep() with usleep_range() in TPM
> 1.2/2.0 generic drivers
> [2] cf151a9a44d5 tpm: reduce tpm polling delay in tpm_tis_core
> [3] 59f5a6b07f64 tpm: reduce poll sleep time in tpm_transmit()
> [4] 424eaf910c32 tpm: reduce polling time to usecs for even finer
> granularity
> 
> Fixes: 9f3fc7bcddcb ("tpm: replace msleep() with usleep_range() in TPM 1.2/2.0 generic drivers")
> Link: https://patchwork.kernel.org/project/linux-integrity/patch/20200926223150.109645-1-hao.wu@rubrik.com/
> Signed-off-by: Hao Wu <hao.wu@rubrik.com>
> ---
> This version (v2) has following changes on top of the last (v1):
> - follow the existing way to define two timeouts (min and max)
>   for ATMEL chip, thus keep the exact timeout logic for 
>   non-ATEML chips.
> - limit the timeout increase to only ATMEL TPM 1.2 chips,
>   because it is not an issue for TPM 2.0 chips yet.
> 
> Test Plan:
> - Run fixed kernel with ATMEL TPM chips and see crash has been fixed.
> - Run fixed kernel with non-ATMEL TPM chips, and confirm
>   the timeout has not been changed.
> 
>  drivers/char/tpm/tpm.h          |  6 ++++--
>  drivers/char/tpm/tpm_tis_core.c | 23 +++++++++++++++++++++--
>  include/linux/tpm.h             |  3 +++
>  3 files changed, 28 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/char/tpm/tpm.h b/drivers/char/tpm/tpm.h
> index 283f78211c3a..6de1b44c4aab 100644
> --- a/drivers/char/tpm/tpm.h
> +++ b/drivers/char/tpm/tpm.h
> @@ -41,8 +41,10 @@ enum tpm_timeout {
>  	TPM_TIMEOUT_RETRY = 100, /* msecs */
>  	TPM_TIMEOUT_RANGE_US = 300,	/* usecs */
>  	TPM_TIMEOUT_POLL = 1,	/* msecs */
> -	TPM_TIMEOUT_USECS_MIN = 100,      /* usecs */
> -	TPM_TIMEOUT_USECS_MAX = 500      /* usecs */
> +	TPM_TIMEOUT_USECS_MIN = 100,	/* usecs */
> +	TPM_TIMEOUT_USECS_MAX = 500,	/* usecs */
> +	TPM_ATML_TIMEOUT_WAIT_STAT_MIN = 14700,	/* usecs */
> +	TPM_ATML_TIMEOUT_WAIT_STAT_MAX = 15000	/* usecs */
>  };
>  
>  /* TPM addresses */
> diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
> index 55b9d3965ae1..ae27d66fdd94 100644
> --- a/drivers/char/tpm/tpm_tis_core.c
> +++ b/drivers/char/tpm/tpm_tis_core.c
> @@ -80,8 +80,17 @@ static int wait_for_tpm_stat(struct tpm_chip *chip, u8 mask,
>  		}
>  	} else {
>  		do {
> -			usleep_range(TPM_TIMEOUT_USECS_MIN,
> -				     TPM_TIMEOUT_USECS_MAX);
> +			/* this code path could be executed before
> +			 * timeouts initialized in chip instance.
> +			 */
> +			if (chip->timeout_wait_stat_min &&
> +			    chip->timeout_wait_stat_max)
> +				usleep_range(chip->timeout_wait_stat_min,
> +					     chip->timeout_wait_stat_max);
> +			else
> +				usleep_range(TPM_TIMEOUT_USECS_MIN,
> +					     TPM_TIMEOUT_USECS_MAX);

This starts to look otherwise fine but you don't need this condition.
Just initialize variables to TPM_TIMEOUT_USECS_{MIN, MAX} for non-Atmel.

/Jarkko

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v2] tpm: fix Atmel TPM crash caused by too frequent queries
  2021-07-09 17:47       ` Jarkko Sakkinen
@ 2021-07-09 19:23         ` Hao Wu
  2021-07-11  7:37           ` Hao Wu
  0 siblings, 1 reply; 91+ messages in thread
From: Hao Wu @ 2021-07-09 19:23 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: Shrihari Kalkar, Seungyeop Han, Anish Jhaveri, peterhuewe, jgg,
	linux-integrity, Paul Menzel, Ken Goldman, zohar, why2jjj.linux,
	Hamza Attak, gregkh, arnd, Nayna, James.Bottomley

> On Jul 9, 2021, at 10:47 AM, Jarkko Sakkinen <jarkko@kernel.org> wrote:
> 
> On Thu, Jul 08, 2021 at 09:40:28PM -0700, Hao Wu wrote:
>> The Atmel TPM 1.2 chips crash with error
>> `tpm_try_transmit: send(): error -62` since kernel 4.14.
>> It is observed from the kernel log after running `tpm_sealdata -z`.
>> The error thrown from the command is as follows
>> ```
>> $ tpm_sealdata -z
>> Tspi_Key_LoadKey failed: 0x00001087 - layer=tddl,
>> code=0087 (135), I/O error
>> ```
>> 
>> The issue was reproduced with the following Atmel TPM chip:
>> ```
>> $ tpm_version
>> T0  TPM 1.2 Version Info:
>>  Chip Version:        1.2.66.1
>>  Spec Level:          2
>>  Errata Revision:     3
>>  TPM Vendor ID:       ATML
>>  TPM Version:         01010000
>>  Manufacturer Info:   41544d4c
>> ```
>> 
>> The root cause of the issue is due to the TPM calls to msleep()
>> were replaced with usleep_range() [1], which reduces
>> the actual timeout. Via experiments, it is observed that
>> the original msleep(5) actually sleeps for 15ms.
>> Because of a known timeout issue in Atmel TPM 1.2 chip,
>> the shorter timeout than 15ms can cause the error described above.
>> 
>> A few further changes in kernel 4.16 [2] and 4.18 [3, 4] further
>> reduced the timeout to less than 1ms. With experiments,
>> the problematic timeout in the latest kernel is the one
>> for `wait_for_tpm_stat`.
>> 
>> To fix it, the patch reverts the timeout of `wait_for_tpm_stat`
>> to 15ms for all Atmel TPM 1.2 chips, but leave it untouched
>> for Ateml TPM 2.0 chip, and chips from other vendors.
>> As explained above, the chosen 15ms timeout is
>> the actual timeout before this issue introduced,
>> thus the old value is used here.
>> Particularly, TPM_ATML_TIMEOUT_WAIT_STAT_MIN is set to 14700us,
>> TPM_ATML_TIMEOUT_WAIT_STAT_MIN is set to 15000us according to
>> the existing TPM_TIMEOUT_RANGE_US (300us).
>> The fixed has been tested in the system with the affected Atmel chip
>> with no issues observed after boot up.
>> 
>> References:
>> [1] 9f3fc7bcddcb tpm: replace msleep() with usleep_range() in TPM
>> 1.2/2.0 generic drivers
>> [2] cf151a9a44d5 tpm: reduce tpm polling delay in tpm_tis_core
>> [3] 59f5a6b07f64 tpm: reduce poll sleep time in tpm_transmit()
>> [4] 424eaf910c32 tpm: reduce polling time to usecs for even finer
>> granularity
>> 
>> Fixes: 9f3fc7bcddcb ("tpm: replace msleep() with usleep_range() in TPM 1.2/2.0 generic drivers")
>> Link: https://patchwork.kernel.org/project/linux-integrity/patch/20200926223150.109645-1-hao.wu@rubrik.com/
>> Signed-off-by: Hao Wu <hao.wu@rubrik.com>
>> ---
>> This version (v2) has following changes on top of the last (v1):
>> - follow the existing way to define two timeouts (min and max)
>>  for ATMEL chip, thus keep the exact timeout logic for 
>>  non-ATEML chips.
>> - limit the timeout increase to only ATMEL TPM 1.2 chips,
>>  because it is not an issue for TPM 2.0 chips yet.
>> 
>> Test Plan:
>> - Run fixed kernel with ATMEL TPM chips and see crash has been fixed.
>> - Run fixed kernel with non-ATMEL TPM chips, and confirm
>>  the timeout has not been changed.
>> 
>> drivers/char/tpm/tpm.h          |  6 ++++--
>> drivers/char/tpm/tpm_tis_core.c | 23 +++++++++++++++++++++--
>> include/linux/tpm.h             |  3 +++
>> 3 files changed, 28 insertions(+), 4 deletions(-)
>> 
>> diff --git a/drivers/char/tpm/tpm.h b/drivers/char/tpm/tpm.h
>> index 283f78211c3a..6de1b44c4aab 100644
>> --- a/drivers/char/tpm/tpm.h
>> +++ b/drivers/char/tpm/tpm.h
>> @@ -41,8 +41,10 @@ enum tpm_timeout {
>> 	TPM_TIMEOUT_RETRY = 100, /* msecs */
>> 	TPM_TIMEOUT_RANGE_US = 300,	/* usecs */
>> 	TPM_TIMEOUT_POLL = 1,	/* msecs */
>> -	TPM_TIMEOUT_USECS_MIN = 100,      /* usecs */
>> -	TPM_TIMEOUT_USECS_MAX = 500      /* usecs */
>> +	TPM_TIMEOUT_USECS_MIN = 100,	/* usecs */
>> +	TPM_TIMEOUT_USECS_MAX = 500,	/* usecs */
>> +	TPM_ATML_TIMEOUT_WAIT_STAT_MIN = 14700,	/* usecs */
>> +	TPM_ATML_TIMEOUT_WAIT_STAT_MAX = 15000	/* usecs */
>> };
>> 
>> /* TPM addresses */
>> diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
>> index 55b9d3965ae1..ae27d66fdd94 100644
>> --- a/drivers/char/tpm/tpm_tis_core.c
>> +++ b/drivers/char/tpm/tpm_tis_core.c
>> @@ -80,8 +80,17 @@ static int wait_for_tpm_stat(struct tpm_chip *chip, u8 mask,
>> 		}
>> 	} else {
>> 		do {
>> -			usleep_range(TPM_TIMEOUT_USECS_MIN,
>> -				     TPM_TIMEOUT_USECS_MAX);
>> +			/* this code path could be executed before
>> +			 * timeouts initialized in chip instance.
>> +			 */
>> +			if (chip->timeout_wait_stat_min &&
>> +			    chip->timeout_wait_stat_max)
>> +				usleep_range(chip->timeout_wait_stat_min,
>> +					     chip->timeout_wait_stat_max);
>> +			else
>> +				usleep_range(TPM_TIMEOUT_USECS_MIN,
>> +					     TPM_TIMEOUT_USECS_MAX);
> 
> This starts to look otherwise fine but you don't need this condition.
> Just initialize variables to TPM_TIMEOUT_USECS_{MIN, MAX} for non-Atmel.
Not sure I got your point or not. We have discussed this question a few rounds before,
I answered you about this. This check is required because before the time of 
Initialization in the code I added in `tpm_tis_core_init`
```
+	chip->timeout_wait_stat_min = TPM_TIMEOUT_USECS_MIN;
+	chip->timeout_wait_stat_max = TPM_TIMEOUT_USECS_MAX;
```
The func `wait_for_tpm_stat` runs, we need the condition to fall back to avoid system startup crash.

Let me know if this makes sense. If needed, I can do another confirm.

> /Jarkko

Hao


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v2] tpm: fix Atmel TPM crash caused by too frequent queries
  2021-07-09 19:23         ` Hao Wu
@ 2021-07-11  7:37           ` Hao Wu
  2021-07-16  5:30             ` Hao Wu
  0 siblings, 1 reply; 91+ messages in thread
From: Hao Wu @ 2021-07-11  7:37 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: Shrihari Kalkar, Seungyeop Han, Anish Jhaveri, peterhuewe, jgg,
	linux-integrity, Paul Menzel, Ken Goldman, zohar, why2jjj.linux,
	Hamza Attak, gregkh, arnd, Nayna, James.Bottomley

> On Jul 9, 2021, at 12:23 PM, Hao Wu <hao.wu@rubrik.com> wrote:
> 
>> On Jul 9, 2021, at 10:47 AM, Jarkko Sakkinen <jarkko@kernel.org> wrote:
>> 
>> On Thu, Jul 08, 2021 at 09:40:28PM -0700, Hao Wu wrote:
>>> The Atmel TPM 1.2 chips crash with error
>>> `tpm_try_transmit: send(): error -62` since kernel 4.14.
>>> It is observed from the kernel log after running `tpm_sealdata -z`.
>>> The error thrown from the command is as follows
>>> ```
>>> $ tpm_sealdata -z
>>> Tspi_Key_LoadKey failed: 0x00001087 - layer=tddl,
>>> code=0087 (135), I/O error
>>> ```
>>> 
>>> The issue was reproduced with the following Atmel TPM chip:
>>> ```
>>> $ tpm_version
>>> T0  TPM 1.2 Version Info:
>>> Chip Version:        1.2.66.1
>>> Spec Level:          2
>>> Errata Revision:     3
>>> TPM Vendor ID:       ATML
>>> TPM Version:         01010000
>>> Manufacturer Info:   41544d4c
>>> ```
>>> 
>>> The root cause of the issue is due to the TPM calls to msleep()
>>> were replaced with usleep_range() [1], which reduces
>>> the actual timeout. Via experiments, it is observed that
>>> the original msleep(5) actually sleeps for 15ms.
>>> Because of a known timeout issue in Atmel TPM 1.2 chip,
>>> the shorter timeout than 15ms can cause the error described above.
>>> 
>>> A few further changes in kernel 4.16 [2] and 4.18 [3, 4] further
>>> reduced the timeout to less than 1ms. With experiments,
>>> the problematic timeout in the latest kernel is the one
>>> for `wait_for_tpm_stat`.
>>> 
>>> To fix it, the patch reverts the timeout of `wait_for_tpm_stat`
>>> to 15ms for all Atmel TPM 1.2 chips, but leave it untouched
>>> for Ateml TPM 2.0 chip, and chips from other vendors.
>>> As explained above, the chosen 15ms timeout is
>>> the actual timeout before this issue introduced,
>>> thus the old value is used here.
>>> Particularly, TPM_ATML_TIMEOUT_WAIT_STAT_MIN is set to 14700us,
>>> TPM_ATML_TIMEOUT_WAIT_STAT_MIN is set to 15000us according to
>>> the existing TPM_TIMEOUT_RANGE_US (300us).
>>> The fixed has been tested in the system with the affected Atmel chip
>>> with no issues observed after boot up.
>>> 
>>> References:
>>> [1] 9f3fc7bcddcb tpm: replace msleep() with usleep_range() in TPM
>>> 1.2/2.0 generic drivers
>>> [2] cf151a9a44d5 tpm: reduce tpm polling delay in tpm_tis_core
>>> [3] 59f5a6b07f64 tpm: reduce poll sleep time in tpm_transmit()
>>> [4] 424eaf910c32 tpm: reduce polling time to usecs for even finer
>>> granularity
>>> 
>>> Fixes: 9f3fc7bcddcb ("tpm: replace msleep() with usleep_range() in TPM 1.2/2.0 generic drivers")
>>> Link: https://patchwork.kernel.org/project/linux-integrity/patch/20200926223150.109645-1-hao.wu@rubrik.com/
>>> Signed-off-by: Hao Wu <hao.wu@rubrik.com>
>>> ---
>>> This version (v2) has following changes on top of the last (v1):
>>> - follow the existing way to define two timeouts (min and max)
>>> for ATMEL chip, thus keep the exact timeout logic for 
>>> non-ATEML chips.
>>> - limit the timeout increase to only ATMEL TPM 1.2 chips,
>>> because it is not an issue for TPM 2.0 chips yet.
>>> 
>>> Test Plan:
>>> - Run fixed kernel with ATMEL TPM chips and see crash has been fixed.
>>> - Run fixed kernel with non-ATMEL TPM chips, and confirm
>>> the timeout has not been changed.
>>> 
>>> drivers/char/tpm/tpm.h          |  6 ++++--
>>> drivers/char/tpm/tpm_tis_core.c | 23 +++++++++++++++++++++--
>>> include/linux/tpm.h             |  3 +++
>>> 3 files changed, 28 insertions(+), 4 deletions(-)
>>> 
>>> diff --git a/drivers/char/tpm/tpm.h b/drivers/char/tpm/tpm.h
>>> index 283f78211c3a..6de1b44c4aab 100644
>>> --- a/drivers/char/tpm/tpm.h
>>> +++ b/drivers/char/tpm/tpm.h
>>> @@ -41,8 +41,10 @@ enum tpm_timeout {
>>> 	TPM_TIMEOUT_RETRY = 100, /* msecs */
>>> 	TPM_TIMEOUT_RANGE_US = 300,	/* usecs */
>>> 	TPM_TIMEOUT_POLL = 1,	/* msecs */
>>> -	TPM_TIMEOUT_USECS_MIN = 100,      /* usecs */
>>> -	TPM_TIMEOUT_USECS_MAX = 500      /* usecs */
>>> +	TPM_TIMEOUT_USECS_MIN = 100,	/* usecs */
>>> +	TPM_TIMEOUT_USECS_MAX = 500,	/* usecs */
>>> +	TPM_ATML_TIMEOUT_WAIT_STAT_MIN = 14700,	/* usecs */
>>> +	TPM_ATML_TIMEOUT_WAIT_STAT_MAX = 15000	/* usecs */
>>> };
>>> 
>>> /* TPM addresses */
>>> diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
>>> index 55b9d3965ae1..ae27d66fdd94 100644
>>> --- a/drivers/char/tpm/tpm_tis_core.c
>>> +++ b/drivers/char/tpm/tpm_tis_core.c
>>> @@ -80,8 +80,17 @@ static int wait_for_tpm_stat(struct tpm_chip *chip, u8 mask,
>>> 		}
>>> 	} else {
>>> 		do {
>>> -			usleep_range(TPM_TIMEOUT_USECS_MIN,
>>> -				     TPM_TIMEOUT_USECS_MAX);
>>> +			/* this code path could be executed before
>>> +			 * timeouts initialized in chip instance.
>>> +			 */
>>> +			if (chip->timeout_wait_stat_min &&
>>> +			    chip->timeout_wait_stat_max)
>>> +				usleep_range(chip->timeout_wait_stat_min,
>>> +					     chip->timeout_wait_stat_max);
>>> +			else
>>> +				usleep_range(TPM_TIMEOUT_USECS_MIN,
>>> +					     TPM_TIMEOUT_USECS_MAX);
>> 
>> This starts to look otherwise fine but you don't need this condition.
>> Just initialize variables to TPM_TIMEOUT_USECS_{MIN, MAX} for non-Atmel.
> Not sure I got your point or not. We have discussed this question a few rounds before,
> I answered you about this. This check is required because before the time of 
> Initialization in the code I added in `tpm_tis_core_init`
> ```
> +	chip->timeout_wait_stat_min = TPM_TIMEOUT_USECS_MIN;
> +	chip->timeout_wait_stat_max = TPM_TIMEOUT_USECS_MAX;
> ```
> The func `wait_for_tpm_stat` runs, we need the condition to fall back to avoid system startup crash.
> 
> Let me know if this makes sense. If needed, I can do another confirm.
I double checked this, and found the current init lines in `tpm_tis_core_init` 
is actually before this code path now. Maybe it was an issue in one
of my old revision and I had the wrong impression. 
The condition seems ok to remove in the current revision. 

But I am not fully sure is if the behavior is consistent across other 1.2 chips, and TPM 2.0 chips.
Should we still keep the condition for robustness or ship without it ?  

>> /Jarkko
> 
> Hao

Hao


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v3] tpm: fix Atmel TPM crash caused by too frequent queries
  2021-07-09  4:40     ` [PATCH v2] tpm: fix Atmel " Hao Wu
  2021-07-09 17:47       ` Jarkko Sakkinen
@ 2021-07-11  7:51       ` Hao Wu
  2021-07-27  2:46         ` Jarkko Sakkinen
  2021-08-14 22:25         ` [PATCH v4] " Hao Wu
  1 sibling, 2 replies; 91+ messages in thread
From: Hao Wu @ 2021-07-11  7:51 UTC (permalink / raw)
  To: hao.wu, shrihari.kalkar, seungyeop.han, anish.jhaveri,
	peterhuewe, jarkko, jgg, linux-integrity, pmenzel, kgold, zohar,
	why2jjj.linux, hamza, gregkh, arnd, nayna, James.Bottomley

The Atmel TPM 1.2 chips crash with error
`tpm_try_transmit: send(): error -62` since kernel 4.14.
It is observed from the kernel log after running `tpm_sealdata -z`.
The error thrown from the command is as follows
```
$ tpm_sealdata -z
Tspi_Key_LoadKey failed: 0x00001087 - layer=tddl,
code=0087 (135), I/O error
```

The issue was reproduced with the following Atmel TPM chip:
```
$ tpm_version
T0  TPM 1.2 Version Info:
  Chip Version:        1.2.66.1
  Spec Level:          2
  Errata Revision:     3
  TPM Vendor ID:       ATML
  TPM Version:         01010000
  Manufacturer Info:   41544d4c
```

The root cause of the issue is due to the TPM calls to msleep()
were replaced with usleep_range() [1], which reduces
the actual timeout. Via experiments, it is observed that
the original msleep(5) actually sleeps for 15ms.
Because of a known timeout issue in Atmel TPM 1.2 chip,
the shorter timeout than 15ms can cause the error described above.

A few further changes in kernel 4.16 [2] and 4.18 [3, 4] further
reduced the timeout to less than 1ms. With experiments,
the problematic timeout in the latest kernel is the one
for `wait_for_tpm_stat`.

To fix it, the patch reverts the timeout of `wait_for_tpm_stat`
to 15ms for all Atmel TPM 1.2 chips, but leave it untouched
for Ateml TPM 2.0 chip, and chips from other vendors.
As explained above, the chosen 15ms timeout is
the actual timeout before this issue introduced,
thus the old value is used here.
Particularly, TPM_ATML_TIMEOUT_WAIT_STAT_MIN is set to 14700us,
TPM_ATML_TIMEOUT_WAIT_STAT_MIN is set to 15000us according to
the existing TPM_TIMEOUT_RANGE_US (300us).
The fixed has been tested in the system with the affected Atmel chip
with no issues observed after boot up.

References:
[1] 9f3fc7bcddcb tpm: replace msleep() with usleep_range() in TPM
1.2/2.0 generic drivers
[2] cf151a9a44d5 tpm: reduce tpm polling delay in tpm_tis_core
[3] 59f5a6b07f64 tpm: reduce poll sleep time in tpm_transmit()
[4] 424eaf910c32 tpm: reduce polling time to usecs for even finer
granularity

Fixes: 9f3fc7bcddcb ("tpm: replace msleep() with usleep_range() in TPM 1.2/2.0 generic drivers")
Link: https://patchwork.kernel.org/project/linux-integrity/patch/20200926223150.109645-1-hao.wu@rubrik.com/
Signed-off-by: Hao Wu <hao.wu@rubrik.com>
---
This version (v3) removes unnecessary condition check
in `wait_for_tpm_stat`.

Test Plan:
- Run fixed kernel with ATMEL TPM chips and see crash
has been fixed.
- Run fixed kernel with non-ATMEL TPM chips, and confirm
the timeout has not been changed.

drivers/char/tpm/tpm.h          |  6 ++++--
 drivers/char/tpm/tpm_tis_core.c | 14 ++++++++++++--
 include/linux/tpm.h             |  3 +++
 3 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/drivers/char/tpm/tpm.h b/drivers/char/tpm/tpm.h
index 283f78211c3a..6de1b44c4aab 100644
--- a/drivers/char/tpm/tpm.h
+++ b/drivers/char/tpm/tpm.h
@@ -41,8 +41,10 @@ enum tpm_timeout {
 	TPM_TIMEOUT_RETRY = 100, /* msecs */
 	TPM_TIMEOUT_RANGE_US = 300,	/* usecs */
 	TPM_TIMEOUT_POLL = 1,	/* msecs */
-	TPM_TIMEOUT_USECS_MIN = 100,      /* usecs */
-	TPM_TIMEOUT_USECS_MAX = 500      /* usecs */
+	TPM_TIMEOUT_USECS_MIN = 100,	/* usecs */
+	TPM_TIMEOUT_USECS_MAX = 500,	/* usecs */
+	TPM_ATML_TIMEOUT_WAIT_STAT_MIN = 14700,	/* usecs */
+	TPM_ATML_TIMEOUT_WAIT_STAT_MAX = 15000	/* usecs */
 };
 
 /* TPM addresses */
diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
index 55b9d3965ae1..2de1f71e8ae1 100644
--- a/drivers/char/tpm/tpm_tis_core.c
+++ b/drivers/char/tpm/tpm_tis_core.c
@@ -80,8 +80,8 @@ static int wait_for_tpm_stat(struct tpm_chip *chip, u8 mask,
 		}
 	} else {
 		do {
-			usleep_range(TPM_TIMEOUT_USECS_MIN,
-				     TPM_TIMEOUT_USECS_MAX);
+			usleep_range(chip->timeout_wait_stat_min,
+				     chip->timeout_wait_stat_max);
 			status = chip->ops->status(chip);
 			if ((status & mask) == mask)
 				return 0;
@@ -934,6 +934,9 @@ int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq,
 	chip->timeout_b = msecs_to_jiffies(TIS_TIMEOUT_B_MAX);
 	chip->timeout_c = msecs_to_jiffies(TIS_TIMEOUT_C_MAX);
 	chip->timeout_d = msecs_to_jiffies(TIS_TIMEOUT_D_MAX);
+	/* init timeouts for wait_for_tpm_stat */
+	chip->timeout_wait_stat_min = TPM_TIMEOUT_USECS_MIN;
+	chip->timeout_wait_stat_max = TPM_TIMEOUT_USECS_MAX;
 	priv->phy_ops = phy_ops;
 	dev_set_drvdata(&chip->dev, priv);
 
@@ -983,6 +986,13 @@ int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq,
 
 	priv->manufacturer_id = vendor;
 
+	if (priv->manufacturer_id == TPM_VID_ATML &&
+		!(chip->flags & TPM_CHIP_FLAG_TPM2)) {
+		/* If TPM chip is 1.2 ATMEL chip, timeout need to be relaxed*/
+		chip->timeout_wait_stat_min = TPM_ATML_TIMEOUT_WAIT_STAT_MIN;
+		chip->timeout_wait_stat_max = TPM_ATML_TIMEOUT_WAIT_STAT_MAX;
+	}
+
 	rc = tpm_tis_read8(priv, TPM_RID(0), &rid);
 	if (rc < 0)
 		goto out_err;
diff --git a/include/linux/tpm.h b/include/linux/tpm.h
index aa11fe323c56..171b9102c976 100644
--- a/include/linux/tpm.h
+++ b/include/linux/tpm.h
@@ -150,6 +150,8 @@ struct tpm_chip {
 	bool timeout_adjusted;
 	unsigned long duration[TPM_NUM_DURATIONS]; /* jiffies */
 	bool duration_adjusted;
+	unsigned int timeout_wait_stat_min; /* usecs */
+	unsigned int timeout_wait_stat_max; /* usecs */
 
 	struct dentry *bios_dir[TPM_NUM_EVENT_LOG_FILES];
 
@@ -269,6 +271,7 @@ enum tpm2_cc_attrs {
 #define TPM_VID_INTEL    0x8086
 #define TPM_VID_WINBOND  0x1050
 #define TPM_VID_STM      0x104A
+#define TPM_VID_ATML     0x1114
 
 enum tpm_chip_flags {
 	TPM_CHIP_FLAG_TPM2		= BIT(1),
-- 
2.29.0.vfs.0.0


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v2] tpm: fix Atmel TPM crash caused by too frequent queries
  2021-07-11  7:37           ` Hao Wu
@ 2021-07-16  5:30             ` Hao Wu
  0 siblings, 0 replies; 91+ messages in thread
From: Hao Wu @ 2021-07-16  5:30 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: Shrihari Kalkar, Seungyeop Han, Anish Jhaveri, peterhuewe, jgg,
	linux-integrity, Paul Menzel, Ken Goldman, zohar, why2jjj.linux,
	Hamza Attak, gregkh, arnd, Nayna, James.Bottomley

> On Jul 11, 2021, at 12:37 AM, Hao Wu <hao.wu@rubrik.com> wrote:
> 
>> On Jul 9, 2021, at 12:23 PM, Hao Wu <hao.wu@rubrik.com> wrote:
>> 
>>> On Jul 9, 2021, at 10:47 AM, Jarkko Sakkinen <jarkko@kernel.org> wrote:
>>> 
>>> On Thu, Jul 08, 2021 at 09:40:28PM -0700, Hao Wu wrote:
>>>> The Atmel TPM 1.2 chips crash with error
>>>> `tpm_try_transmit: send(): error -62` since kernel 4.14.
>>>> It is observed from the kernel log after running `tpm_sealdata -z`.
>>>> The error thrown from the command is as follows
>>>> ```
>>>> $ tpm_sealdata -z
>>>> Tspi_Key_LoadKey failed: 0x00001087 - layer=tddl,
>>>> code=0087 (135), I/O error
>>>> ```
>>>> 
>>>> The issue was reproduced with the following Atmel TPM chip:
>>>> ```
>>>> $ tpm_version
>>>> T0  TPM 1.2 Version Info:
>>>> Chip Version:        1.2.66.1
>>>> Spec Level:          2
>>>> Errata Revision:     3
>>>> TPM Vendor ID:       ATML
>>>> TPM Version:         01010000
>>>> Manufacturer Info:   41544d4c
>>>> ```
>>>> 
>>>> The root cause of the issue is due to the TPM calls to msleep()
>>>> were replaced with usleep_range() [1], which reduces
>>>> the actual timeout. Via experiments, it is observed that
>>>> the original msleep(5) actually sleeps for 15ms.
>>>> Because of a known timeout issue in Atmel TPM 1.2 chip,
>>>> the shorter timeout than 15ms can cause the error described above.
>>>> 
>>>> A few further changes in kernel 4.16 [2] and 4.18 [3, 4] further
>>>> reduced the timeout to less than 1ms. With experiments,
>>>> the problematic timeout in the latest kernel is the one
>>>> for `wait_for_tpm_stat`.
>>>> 
>>>> To fix it, the patch reverts the timeout of `wait_for_tpm_stat`
>>>> to 15ms for all Atmel TPM 1.2 chips, but leave it untouched
>>>> for Ateml TPM 2.0 chip, and chips from other vendors.
>>>> As explained above, the chosen 15ms timeout is
>>>> the actual timeout before this issue introduced,
>>>> thus the old value is used here.
>>>> Particularly, TPM_ATML_TIMEOUT_WAIT_STAT_MIN is set to 14700us,
>>>> TPM_ATML_TIMEOUT_WAIT_STAT_MIN is set to 15000us according to
>>>> the existing TPM_TIMEOUT_RANGE_US (300us).
>>>> The fixed has been tested in the system with the affected Atmel chip
>>>> with no issues observed after boot up.
>>>> 
>>>> References:
>>>> [1] 9f3fc7bcddcb tpm: replace msleep() with usleep_range() in TPM
>>>> 1.2/2.0 generic drivers
>>>> [2] cf151a9a44d5 tpm: reduce tpm polling delay in tpm_tis_core
>>>> [3] 59f5a6b07f64 tpm: reduce poll sleep time in tpm_transmit()
>>>> [4] 424eaf910c32 tpm: reduce polling time to usecs for even finer
>>>> granularity
>>>> 
>>>> Fixes: 9f3fc7bcddcb ("tpm: replace msleep() with usleep_range() in TPM 1.2/2.0 generic drivers")
>>>> Link: https://patchwork.kernel.org/project/linux-integrity/patch/20200926223150.109645-1-hao.wu@rubrik.com/
>>>> Signed-off-by: Hao Wu <hao.wu@rubrik.com>
>>>> ---
>>>> This version (v2) has following changes on top of the last (v1):
>>>> - follow the existing way to define two timeouts (min and max)
>>>> for ATMEL chip, thus keep the exact timeout logic for 
>>>> non-ATEML chips.
>>>> - limit the timeout increase to only ATMEL TPM 1.2 chips,
>>>> because it is not an issue for TPM 2.0 chips yet.
>>>> 
>>>> Test Plan:
>>>> - Run fixed kernel with ATMEL TPM chips and see crash has been fixed.
>>>> - Run fixed kernel with non-ATMEL TPM chips, and confirm
>>>> the timeout has not been changed.
>>>> 
>>>> drivers/char/tpm/tpm.h          |  6 ++++--
>>>> drivers/char/tpm/tpm_tis_core.c | 23 +++++++++++++++++++++--
>>>> include/linux/tpm.h             |  3 +++
>>>> 3 files changed, 28 insertions(+), 4 deletions(-)
>>>> 
>>>> diff --git a/drivers/char/tpm/tpm.h b/drivers/char/tpm/tpm.h
>>>> index 283f78211c3a..6de1b44c4aab 100644
>>>> --- a/drivers/char/tpm/tpm.h
>>>> +++ b/drivers/char/tpm/tpm.h
>>>> @@ -41,8 +41,10 @@ enum tpm_timeout {
>>>> 	TPM_TIMEOUT_RETRY = 100, /* msecs */
>>>> 	TPM_TIMEOUT_RANGE_US = 300,	/* usecs */
>>>> 	TPM_TIMEOUT_POLL = 1,	/* msecs */
>>>> -	TPM_TIMEOUT_USECS_MIN = 100,      /* usecs */
>>>> -	TPM_TIMEOUT_USECS_MAX = 500      /* usecs */
>>>> +	TPM_TIMEOUT_USECS_MIN = 100,	/* usecs */
>>>> +	TPM_TIMEOUT_USECS_MAX = 500,	/* usecs */
>>>> +	TPM_ATML_TIMEOUT_WAIT_STAT_MIN = 14700,	/* usecs */
>>>> +	TPM_ATML_TIMEOUT_WAIT_STAT_MAX = 15000	/* usecs */
>>>> };
>>>> 
>>>> /* TPM addresses */
>>>> diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
>>>> index 55b9d3965ae1..ae27d66fdd94 100644
>>>> --- a/drivers/char/tpm/tpm_tis_core.c
>>>> +++ b/drivers/char/tpm/tpm_tis_core.c
>>>> @@ -80,8 +80,17 @@ static int wait_for_tpm_stat(struct tpm_chip *chip, u8 mask,
>>>> 		}
>>>> 	} else {
>>>> 		do {
>>>> -			usleep_range(TPM_TIMEOUT_USECS_MIN,
>>>> -				     TPM_TIMEOUT_USECS_MAX);
>>>> +			/* this code path could be executed before
>>>> +			 * timeouts initialized in chip instance.
>>>> +			 */
>>>> +			if (chip->timeout_wait_stat_min &&
>>>> +			    chip->timeout_wait_stat_max)
>>>> +				usleep_range(chip->timeout_wait_stat_min,
>>>> +					     chip->timeout_wait_stat_max);
>>>> +			else
>>>> +				usleep_range(TPM_TIMEOUT_USECS_MIN,
>>>> +					     TPM_TIMEOUT_USECS_MAX);
>>> 
>>> This starts to look otherwise fine but you don't need this condition.
>>> Just initialize variables to TPM_TIMEOUT_USECS_{MIN, MAX} for non-Atmel.
>> Not sure I got your point or not. We have discussed this question a few rounds before,
>> I answered you about this. This check is required because before the time of 
>> Initialization in the code I added in `tpm_tis_core_init`
>> ```
>> +	chip->timeout_wait_stat_min = TPM_TIMEOUT_USECS_MIN;
>> +	chip->timeout_wait_stat_max = TPM_TIMEOUT_USECS_MAX;
>> ```
>> The func `wait_for_tpm_stat` runs, we need the condition to fall back to avoid system startup crash.
>> 
>> Let me know if this makes sense. If needed, I can do another confirm.
> I double checked this, and found the current init lines in `tpm_tis_core_init` 
> is actually before this code path now. Maybe it was an issue in one
> of my old revision and I had the wrong impression. 
> The condition seems ok to remove in the current revision. 
> 
> But I am not fully sure is if the behavior is consistent across other 1.2 chips, and TPM 2.0 chips.
> Should we still keep the condition for robustness or ship without it ?  
> 
This has been updated in a v3 patch 
https://patchwork.kernel.org/project/linux-integrity/patch/20210711075122.30056-1-hao.wu@rubrik.com/

Let me know if that is preferred. I tested in both atmel and non-atmel machine. Works fine so far.

>>> /Jarkko
>> 
>> Hao
> 
> Hao

Hao


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v3] tpm: fix Atmel TPM crash caused by too frequent queries
  2021-07-11  7:51       ` [PATCH v3] " Hao Wu
@ 2021-07-27  2:46         ` Jarkko Sakkinen
  2021-07-27  3:40           ` Hao Wu
  2021-08-14 22:25         ` [PATCH v4] " Hao Wu
  1 sibling, 1 reply; 91+ messages in thread
From: Jarkko Sakkinen @ 2021-07-27  2:46 UTC (permalink / raw)
  To: Hao Wu
  Cc: shrihari.kalkar, seungyeop.han, anish.jhaveri, peterhuewe, jgg,
	linux-integrity, pmenzel, kgold, zohar, why2jjj.linux, hamza,
	gregkh, arnd, nayna, James.Bottomley

On Sun, Jul 11, 2021 at 12:51:22AM -0700, Hao Wu wrote:
> The Atmel TPM 1.2 chips crash with error
> `tpm_try_transmit: send(): error -62` since kernel 4.14.
> It is observed from the kernel log after running `tpm_sealdata -z`.
> The error thrown from the command is as follows
> ```
> $ tpm_sealdata -z
> Tspi_Key_LoadKey failed: 0x00001087 - layer=tddl,
> code=0087 (135), I/O error
> ```
> 
> The issue was reproduced with the following Atmel TPM chip:
> ```
> $ tpm_version
> T0  TPM 1.2 Version Info:
>   Chip Version:        1.2.66.1
>   Spec Level:          2
>   Errata Revision:     3
>   TPM Vendor ID:       ATML
>   TPM Version:         01010000
>   Manufacturer Info:   41544d4c
> ```
> 
> The root cause of the issue is due to the TPM calls to msleep()
> were replaced with usleep_range() [1], which reduces
> the actual timeout. Via experiments, it is observed that
> the original msleep(5) actually sleeps for 15ms.
> Because of a known timeout issue in Atmel TPM 1.2 chip,
> the shorter timeout than 15ms can cause the error described above.
> 
> A few further changes in kernel 4.16 [2] and 4.18 [3, 4] further
> reduced the timeout to less than 1ms. With experiments,
> the problematic timeout in the latest kernel is the one
> for `wait_for_tpm_stat`.
> 
> To fix it, the patch reverts the timeout of `wait_for_tpm_stat`
> to 15ms for all Atmel TPM 1.2 chips, but leave it untouched
> for Ateml TPM 2.0 chip, and chips from other vendors.
> As explained above, the chosen 15ms timeout is
> the actual timeout before this issue introduced,
> thus the old value is used here.
> Particularly, TPM_ATML_TIMEOUT_WAIT_STAT_MIN is set to 14700us,
> TPM_ATML_TIMEOUT_WAIT_STAT_MIN is set to 15000us according to
> the existing TPM_TIMEOUT_RANGE_US (300us).
> The fixed has been tested in the system with the affected Atmel chip
> with no issues observed after boot up.
> 
> References:
> [1] 9f3fc7bcddcb tpm: replace msleep() with usleep_range() in TPM
> 1.2/2.0 generic drivers
> [2] cf151a9a44d5 tpm: reduce tpm polling delay in tpm_tis_core
> [3] 59f5a6b07f64 tpm: reduce poll sleep time in tpm_transmit()
> [4] 424eaf910c32 tpm: reduce polling time to usecs for even finer
> granularity
> 
> Fixes: 9f3fc7bcddcb ("tpm: replace msleep() with usleep_range() in TPM 1.2/2.0 generic drivers")
> Link: https://patchwork.kernel.org/project/linux-integrity/patch/20200926223150.109645-1-hao.wu@rubrik.com/
> Signed-off-by: Hao Wu <hao.wu@rubrik.com>
> ---
> This version (v3) removes unnecessary condition check
> in `wait_for_tpm_stat`.

Missing change long v1 -> v2.

Please do something like

v3:
- ...

v2:
- ...

> 
> Test Plan:
> - Run fixed kernel with ATMEL TPM chips and see crash
> has been fixed.
> - Run fixed kernel with non-ATMEL TPM chips, and confirm
> the timeout has not been changed.
> 
> drivers/char/tpm/tpm.h          |  6 ++++--
>  drivers/char/tpm/tpm_tis_core.c | 14 ++++++++++++--
>  include/linux/tpm.h             |  3 +++
>  3 files changed, 19 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/char/tpm/tpm.h b/drivers/char/tpm/tpm.h
> index 283f78211c3a..6de1b44c4aab 100644
> --- a/drivers/char/tpm/tpm.h
> +++ b/drivers/char/tpm/tpm.h
> @@ -41,8 +41,10 @@ enum tpm_timeout {
>  	TPM_TIMEOUT_RETRY = 100, /* msecs */
>  	TPM_TIMEOUT_RANGE_US = 300,	/* usecs */
>  	TPM_TIMEOUT_POLL = 1,	/* msecs */
> -	TPM_TIMEOUT_USECS_MIN = 100,      /* usecs */
> -	TPM_TIMEOUT_USECS_MAX = 500      /* usecs */
> +	TPM_TIMEOUT_USECS_MIN = 100,	/* usecs */
> +	TPM_TIMEOUT_USECS_MAX = 500,	/* usecs */

What is going on here?

These lines should not change.

> +	TPM_ATML_TIMEOUT_WAIT_STAT_MIN = 14700,	/* usecs */
> +	TPM_ATML_TIMEOUT_WAIT_STAT_MAX = 15000	/* usecs */

Move these definitions to tpm_tis_core.h. They are only useful
for a single driver.


>  };
>  
>  /* TPM addresses */
> diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
> index 55b9d3965ae1..2de1f71e8ae1 100644
> --- a/drivers/char/tpm/tpm_tis_core.c
> +++ b/drivers/char/tpm/tpm_tis_core.c
> @@ -80,8 +80,8 @@ static int wait_for_tpm_stat(struct tpm_chip *chip, u8 mask,
>  		}
>  	} else {
>  		do {
> -			usleep_range(TPM_TIMEOUT_USECS_MIN,
> -				     TPM_TIMEOUT_USECS_MAX);
> +			usleep_range(chip->timeout_wait_stat_min,
> +				     chip->timeout_wait_stat_max);
>  			status = chip->ops->status(chip);
>  			if ((status & mask) == mask)
>  				return 0;
> @@ -934,6 +934,9 @@ int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq,
>  	chip->timeout_b = msecs_to_jiffies(TIS_TIMEOUT_B_MAX);
>  	chip->timeout_c = msecs_to_jiffies(TIS_TIMEOUT_C_MAX);
>  	chip->timeout_d = msecs_to_jiffies(TIS_TIMEOUT_D_MAX);
> +	/* init timeouts for wait_for_tpm_stat */

Remove this comment.

> +	chip->timeout_wait_stat_min = TPM_TIMEOUT_USECS_MIN;
> +	chip->timeout_wait_stat_max = TPM_TIMEOUT_USECS_MAX;
>  	priv->phy_ops = phy_ops;
>  	dev_set_drvdata(&chip->dev, priv);
>  
> @@ -983,6 +986,13 @@ int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq,
>  
>  	priv->manufacturer_id = vendor;
>  
> +	if (priv->manufacturer_id == TPM_VID_ATML &&
> +		!(chip->flags & TPM_CHIP_FLAG_TPM2)) {
> +		/* If TPM chip is 1.2 ATMEL chip, timeout need to be relaxed*/
> +		chip->timeout_wait_stat_min = TPM_ATML_TIMEOUT_WAIT_STAT_MIN;
> +		chip->timeout_wait_stat_max = TPM_ATML_TIMEOUT_WAIT_STAT_MAX;
> +	}
> +
>  	rc = tpm_tis_read8(priv, TPM_RID(0), &rid);
>  	if (rc < 0)
>  		goto out_err;
> diff --git a/include/linux/tpm.h b/include/linux/tpm.h
> index aa11fe323c56..171b9102c976 100644
> --- a/include/linux/tpm.h
> +++ b/include/linux/tpm.h
> @@ -150,6 +150,8 @@ struct tpm_chip {
>  	bool timeout_adjusted;
>  	unsigned long duration[TPM_NUM_DURATIONS]; /* jiffies */
>  	bool duration_adjusted;
> +	unsigned int timeout_wait_stat_min; /* usecs */
> +	unsigned int timeout_wait_stat_max; /* usecs */
>  
>  	struct dentry *bios_dir[TPM_NUM_EVENT_LOG_FILES];
>  
> @@ -269,6 +271,7 @@ enum tpm2_cc_attrs {
>  #define TPM_VID_INTEL    0x8086
>  #define TPM_VID_WINBOND  0x1050
>  #define TPM_VID_STM      0x104A
> +#define TPM_VID_ATML     0x1114
>  
>  enum tpm_chip_flags {
>  	TPM_CHIP_FLAG_TPM2		= BIT(1),
> -- 
> 2.29.0.vfs.0.0
> 
> 

/Jarkko

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v3] tpm: fix Atmel TPM crash caused by too frequent queries
  2021-07-27  2:46         ` Jarkko Sakkinen
@ 2021-07-27  3:40           ` Hao Wu
  0 siblings, 0 replies; 91+ messages in thread
From: Hao Wu @ 2021-07-27  3:40 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: Shrihari Kalkar, Seungyeop Han, Anish Jhaveri, peterhuewe, jgg,
	linux-integrity, pmenzel, kgold, zohar, why2jjj.linux, hamza,
	gregkh, arnd, nayna, James.Bottomley

> On Jul 26, 2021, at 7:46 PM, Jarkko Sakkinen <jarkko@kernel.org> wrote:
> 
> On Sun, Jul 11, 2021 at 12:51:22AM -0700, Hao Wu wrote:
>> The Atmel TPM 1.2 chips crash with error
>> `tpm_try_transmit: send(): error -62` since kernel 4.14.
>> It is observed from the kernel log after running `tpm_sealdata -z`.
>> The error thrown from the command is as follows
>> ```
>> $ tpm_sealdata -z
>> Tspi_Key_LoadKey failed: 0x00001087 - layer=tddl,
>> code=0087 (135), I/O error
>> ```
>> 
>> The issue was reproduced with the following Atmel TPM chip:
>> ```
>> $ tpm_version
>> T0  TPM 1.2 Version Info:
>>  Chip Version:        1.2.66.1
>>  Spec Level:          2
>>  Errata Revision:     3
>>  TPM Vendor ID:       ATML
>>  TPM Version:         01010000
>>  Manufacturer Info:   41544d4c
>> ```
>> 
>> The root cause of the issue is due to the TPM calls to msleep()
>> were replaced with usleep_range() [1], which reduces
>> the actual timeout. Via experiments, it is observed that
>> the original msleep(5) actually sleeps for 15ms.
>> Because of a known timeout issue in Atmel TPM 1.2 chip,
>> the shorter timeout than 15ms can cause the error described above.
>> 
>> A few further changes in kernel 4.16 [2] and 4.18 [3, 4] further
>> reduced the timeout to less than 1ms. With experiments,
>> the problematic timeout in the latest kernel is the one
>> for `wait_for_tpm_stat`.
>> 
>> To fix it, the patch reverts the timeout of `wait_for_tpm_stat`
>> to 15ms for all Atmel TPM 1.2 chips, but leave it untouched
>> for Ateml TPM 2.0 chip, and chips from other vendors.
>> As explained above, the chosen 15ms timeout is
>> the actual timeout before this issue introduced,
>> thus the old value is used here.
>> Particularly, TPM_ATML_TIMEOUT_WAIT_STAT_MIN is set to 14700us,
>> TPM_ATML_TIMEOUT_WAIT_STAT_MIN is set to 15000us according to
>> the existing TPM_TIMEOUT_RANGE_US (300us).
>> The fixed has been tested in the system with the affected Atmel chip
>> with no issues observed after boot up.
>> 
>> References:
>> [1] 9f3fc7bcddcb tpm: replace msleep() with usleep_range() in TPM
>> 1.2/2.0 generic drivers
>> [2] cf151a9a44d5 tpm: reduce tpm polling delay in tpm_tis_core
>> [3] 59f5a6b07f64 tpm: reduce poll sleep time in tpm_transmit()
>> [4] 424eaf910c32 tpm: reduce polling time to usecs for even finer
>> granularity
>> 
>> Fixes: 9f3fc7bcddcb ("tpm: replace msleep() with usleep_range() in TPM 1.2/2.0 generic drivers")
>> Link: https://patchwork.kernel.org/project/linux-integrity/patch/20200926223150.109645-1-hao.wu@rubrik.com/
>> Signed-off-by: Hao Wu <hao.wu@rubrik.com>
>> ---
>> This version (v3) removes unnecessary condition check
>> in `wait_for_tpm_stat`.
> 
> Missing change long v1 -> v2.
> 
> Please do something like
> 
> v3:
> - ...
> 
> v2:
> - ...
Ok, I thought it is chained. I will add all changes

>> 
>> Test Plan:
>> - Run fixed kernel with ATMEL TPM chips and see crash
>> has been fixed.
>> - Run fixed kernel with non-ATMEL TPM chips, and confirm
>> the timeout has not been changed.
>> 
>> drivers/char/tpm/tpm.h          |  6 ++++--
>> drivers/char/tpm/tpm_tis_core.c | 14 ++++++++++++--
>> include/linux/tpm.h             |  3 +++
>> 3 files changed, 19 insertions(+), 4 deletions(-)
>> 
>> diff --git a/drivers/char/tpm/tpm.h b/drivers/char/tpm/tpm.h
>> index 283f78211c3a..6de1b44c4aab 100644
>> --- a/drivers/char/tpm/tpm.h
>> +++ b/drivers/char/tpm/tpm.h
>> @@ -41,8 +41,10 @@ enum tpm_timeout {
>> 	TPM_TIMEOUT_RETRY = 100, /* msecs */
>> 	TPM_TIMEOUT_RANGE_US = 300,	/* usecs */
>> 	TPM_TIMEOUT_POLL = 1,	/* msecs */
>> -	TPM_TIMEOUT_USECS_MIN = 100,      /* usecs */
>> -	TPM_TIMEOUT_USECS_MAX = 500      /* usecs */
>> +	TPM_TIMEOUT_USECS_MIN = 100,	/* usecs */
The spaces of this this line is wrong, it should use tab instead of spaces before the `/*`. 
Fixing this by the way here and align the comment 
>> +	TPM_TIMEOUT_USECS_MAX = 500,	/* usecs */
You need to add tailing comma, don’t we?
> What is going on here?
> 
> These lines should not change.

> 
>> +	TPM_ATML_TIMEOUT_WAIT_STAT_MIN = 14700,	/* usecs */
>> +	TPM_ATML_TIMEOUT_WAIT_STAT_MAX = 15000	/* usecs */
> 
> Move these definitions to tpm_tis_core.h. They are only useful
> for a single driver.
I thought putting them along with the original MIN / MAX is easier to understand for code reader. 
Let me know if you have strong opinion though.

> 
>> };
>> 
>> /* TPM addresses */
>> diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
>> index 55b9d3965ae1..2de1f71e8ae1 100644
>> --- a/drivers/char/tpm/tpm_tis_core.c
>> +++ b/drivers/char/tpm/tpm_tis_core.c
>> @@ -80,8 +80,8 @@ static int wait_for_tpm_stat(struct tpm_chip *chip, u8 mask,
>> 		}
>> 	} else {
>> 		do {
>> -			usleep_range(TPM_TIMEOUT_USECS_MIN,
>> -				     TPM_TIMEOUT_USECS_MAX);
>> +			usleep_range(chip->timeout_wait_stat_min,
>> +				     chip->timeout_wait_stat_max);
>> 			status = chip->ops->status(chip);
>> 			if ((status & mask) == mask)
>> 				return 0;
>> @@ -934,6 +934,9 @@ int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq,
>> 	chip->timeout_b = msecs_to_jiffies(TIS_TIMEOUT_B_MAX);
>> 	chip->timeout_c = msecs_to_jiffies(TIS_TIMEOUT_C_MAX);
>> 	chip->timeout_d = msecs_to_jiffies(TIS_TIMEOUT_D_MAX);
>> +	/* init timeouts for wait_for_tpm_stat */
> 
> Remove this comment.
Ok

> 
>> +	chip->timeout_wait_stat_min = TPM_TIMEOUT_USECS_MIN;
>> +	chip->timeout_wait_stat_max = TPM_TIMEOUT_USECS_MAX;
>> 	priv->phy_ops = phy_ops;
>> 	dev_set_drvdata(&chip->dev, priv);
>> 
>> @@ -983,6 +986,13 @@ int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq,
>> 
>> 	priv->manufacturer_id = vendor;
>> 
>> +	if (priv->manufacturer_id == TPM_VID_ATML &&
>> +		!(chip->flags & TPM_CHIP_FLAG_TPM2)) {
>> +		/* If TPM chip is 1.2 ATMEL chip, timeout need to be relaxed*/
>> +		chip->timeout_wait_stat_min = TPM_ATML_TIMEOUT_WAIT_STAT_MIN;
>> +		chip->timeout_wait_stat_max = TPM_ATML_TIMEOUT_WAIT_STAT_MAX;
>> +	}
>> +
>> 	rc = tpm_tis_read8(priv, TPM_RID(0), &rid);
>> 	if (rc < 0)
>> 		goto out_err;
>> diff --git a/include/linux/tpm.h b/include/linux/tpm.h
>> index aa11fe323c56..171b9102c976 100644
>> --- a/include/linux/tpm.h
>> +++ b/include/linux/tpm.h
>> @@ -150,6 +150,8 @@ struct tpm_chip {
>> 	bool timeout_adjusted;
>> 	unsigned long duration[TPM_NUM_DURATIONS]; /* jiffies */
>> 	bool duration_adjusted;
>> +	unsigned int timeout_wait_stat_min; /* usecs */
>> +	unsigned int timeout_wait_stat_max; /* usecs */
>> 
>> 	struct dentry *bios_dir[TPM_NUM_EVENT_LOG_FILES];
>> 
>> @@ -269,6 +271,7 @@ enum tpm2_cc_attrs {
>> #define TPM_VID_INTEL    0x8086
>> #define TPM_VID_WINBOND  0x1050
>> #define TPM_VID_STM      0x104A
>> +#define TPM_VID_ATML     0x1114
>> 
>> enum tpm_chip_flags {
>> 	TPM_CHIP_FLAG_TPM2		= BIT(1),
>> -- 
>> 2.29.0.vfs.0.0
>> 
>> 
> 
> /Jarkko

Hao

^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v4] tpm: fix Atmel TPM crash caused by too frequent queries
  2021-07-11  7:51       ` [PATCH v3] " Hao Wu
  2021-07-27  2:46         ` Jarkko Sakkinen
@ 2021-08-14 22:25         ` Hao Wu
  2021-08-26  5:38           ` Hao Wu
  2021-09-05  3:51           ` [PATCH v5] " Hao Wu
  1 sibling, 2 replies; 91+ messages in thread
From: Hao Wu @ 2021-08-14 22:25 UTC (permalink / raw)
  To: hao.wu, shrihari.kalkar, seungyeop.han, anish.jhaveri,
	peterhuewe, jarkko, jgg, linux-integrity, pmenzel, kgold, zohar,
	why2jjj.linux, hamza, gregkh, arnd, nayna, James.Bottomley

The Atmel TPM 1.2 chips crash with error
`tpm_try_transmit: send(): error -62` since kernel 4.14.
It is observed from the kernel log after running `tpm_sealdata -z`.
The error thrown from the command is as follows
```
$ tpm_sealdata -z
Tspi_Key_LoadKey failed: 0x00001087 - layer=tddl,
code=0087 (135), I/O error
```

The issue was reproduced with the following Atmel TPM chip:
```
$ tpm_version
T0  TPM 1.2 Version Info:
  Chip Version:        1.2.66.1
  Spec Level:          2
  Errata Revision:     3
  TPM Vendor ID:       ATML
  TPM Version:         01010000
  Manufacturer Info:   41544d4c
```

The root cause of the issue is due to the TPM calls to msleep()
were replaced with usleep_range() [1], which reduces
the actual timeout. Via experiments, it is observed that
the original msleep(5) actually sleeps for 15ms.
Because of a known timeout issue in Atmel TPM 1.2 chip,
the shorter timeout than 15ms can cause the error described above.

A few further changes in kernel 4.16 [2] and 4.18 [3, 4] further
reduced the timeout to less than 1ms. With experiments,
the problematic timeout in the latest kernel is the one
for `wait_for_tpm_stat`.

To fix it, the patch reverts the timeout of `wait_for_tpm_stat`
to 15ms for all Atmel TPM 1.2 chips, but leave it untouched
for Ateml TPM 2.0 chip, and chips from other vendors.
As explained above, the chosen 15ms timeout is
the actual timeout before this issue introduced,
thus the old value is used here.
Particularly, TPM_ATML_TIMEOUT_WAIT_STAT_MIN is set to 14700us,
TPM_ATML_TIMEOUT_WAIT_STAT_MIN is set to 15000us according to
the existing TPM_TIMEOUT_RANGE_US (300us).
The fixed has been tested in the system with the affected Atmel chip
with no issues observed after boot up.

References:
[1] 9f3fc7bcddcb tpm: replace msleep() with usleep_range() in TPM
1.2/2.0 generic drivers
[2] cf151a9a44d5 tpm: reduce tpm polling delay in tpm_tis_core
[3] 59f5a6b07f64 tpm: reduce poll sleep time in tpm_transmit()
[4] 424eaf910c32 tpm: reduce polling time to usecs for even finer
granularity

Fixes: 9f3fc7bcddcb ("tpm: replace msleep() with usleep_range() in TPM 1.2/2.0 generic drivers")
Link: https://patchwork.kernel.org/project/linux-integrity/patch/20200926223150.109645-1-hao.wu@rubrik.com/
Signed-off-by: Hao Wu <hao.wu@rubrik.com>
---
v4:
- Move timeout constants to drivers/char/tpm/tpm_tis_core.h
- Cleanup unnecessary inline comment

v3:
- removes unnecessary condition check in `wait_for_tpm_stat`

v2:
- follow the existing way to define two timeouts (min and max)
  for ATMEL chip, thus keep the exact timeout logic for 
  non-ATEML chips.
- limit the timeout increase to only ATMEL TPM 1.2 chips,
  because it is not an issue for TPM 2.0 chips yet.

Test Plan:
- Run fixed kernel with ATMEL TPM chips and see crash
has been fixed.
- Run fixed kernel with non-ATMEL TPM chips, and confirm
the timeout has not been changed.

 drivers/char/tpm/tpm_tis_core.c | 13 +++++++++++--
 drivers/char/tpm/tpm_tis_core.h |  2 ++
 include/linux/tpm.h             |  3 +++
 3 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
index 55b9d3965ae1..24605f100e96 100644
--- a/drivers/char/tpm/tpm_tis_core.c
+++ b/drivers/char/tpm/tpm_tis_core.c
@@ -80,8 +80,8 @@ static int wait_for_tpm_stat(struct tpm_chip *chip, u8 mask,
 		}
 	} else {
 		do {
-			usleep_range(TPM_TIMEOUT_USECS_MIN,
-				     TPM_TIMEOUT_USECS_MAX);
+			usleep_range(chip->timeout_wait_stat_min,
+				     chip->timeout_wait_stat_max);
 			status = chip->ops->status(chip);
 			if ((status & mask) == mask)
 				return 0;
@@ -934,6 +934,8 @@ int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq,
 	chip->timeout_b = msecs_to_jiffies(TIS_TIMEOUT_B_MAX);
 	chip->timeout_c = msecs_to_jiffies(TIS_TIMEOUT_C_MAX);
 	chip->timeout_d = msecs_to_jiffies(TIS_TIMEOUT_D_MAX);
+	chip->timeout_wait_stat_min = TPM_TIMEOUT_USECS_MIN;
+	chip->timeout_wait_stat_max = TPM_TIMEOUT_USECS_MAX;
 	priv->phy_ops = phy_ops;
 	dev_set_drvdata(&chip->dev, priv);
 
@@ -983,6 +985,13 @@ int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq,
 
 	priv->manufacturer_id = vendor;
 
+	if (priv->manufacturer_id == TPM_VID_ATML &&
+		!(chip->flags & TPM_CHIP_FLAG_TPM2)) {
+		/* If TPM chip is 1.2 ATMEL chip, timeout need to be relaxed*/
+		chip->timeout_wait_stat_min = TPM_ATML_TIMEOUT_WAIT_STAT_MIN;
+		chip->timeout_wait_stat_max = TPM_ATML_TIMEOUT_WAIT_STAT_MAX;
+	}
+
 	rc = tpm_tis_read8(priv, TPM_RID(0), &rid);
 	if (rc < 0)
 		goto out_err;
diff --git a/drivers/char/tpm/tpm_tis_core.h b/drivers/char/tpm/tpm_tis_core.h
index 9b2d32a59f67..2e431beb44f7 100644
--- a/drivers/char/tpm/tpm_tis_core.h
+++ b/drivers/char/tpm/tpm_tis_core.h
@@ -54,6 +54,8 @@ enum tis_defaults {
 	TIS_MEM_LEN = 0x5000,
 	TIS_SHORT_TIMEOUT = 750,	/* ms */
 	TIS_LONG_TIMEOUT = 2000,	/* 2 sec */
+	TPM_ATML_TIMEOUT_WAIT_STAT_MIN = 14700,	/* usecs */
+	TPM_ATML_TIMEOUT_WAIT_STAT_MAX = 15000,	/* usecs */
 };
 
 /* Some timeout values are needed before it is known whether the chip is
diff --git a/include/linux/tpm.h b/include/linux/tpm.h
index aa11fe323c56..171b9102c976 100644
--- a/include/linux/tpm.h
+++ b/include/linux/tpm.h
@@ -150,6 +150,8 @@ struct tpm_chip {
 	bool timeout_adjusted;
 	unsigned long duration[TPM_NUM_DURATIONS]; /* jiffies */
 	bool duration_adjusted;
+	unsigned int timeout_wait_stat_min; /* usecs */
+	unsigned int timeout_wait_stat_max; /* usecs */
 
 	struct dentry *bios_dir[TPM_NUM_EVENT_LOG_FILES];
 
@@ -269,6 +271,7 @@ enum tpm2_cc_attrs {
 #define TPM_VID_INTEL    0x8086
 #define TPM_VID_WINBOND  0x1050
 #define TPM_VID_STM      0x104A
+#define TPM_VID_ATML     0x1114
 
 enum tpm_chip_flags {
 	TPM_CHIP_FLAG_TPM2		= BIT(1),
-- 
2.29.0.vfs.0.0


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v4] tpm: fix Atmel TPM crash caused by too frequent queries
  2021-08-14 22:25         ` [PATCH v4] " Hao Wu
@ 2021-08-26  5:38           ` Hao Wu
  2021-08-26 16:24             ` Jarkko Sakkinen
  2021-09-05  3:51           ` [PATCH v5] " Hao Wu
  1 sibling, 1 reply; 91+ messages in thread
From: Hao Wu @ 2021-08-26  5:38 UTC (permalink / raw)
  To: Hao Wu, Shrihari Kalkar, Han Seungyeop, Anish Jhaveri,
	peterhuewe, Jarkko Sakkinen, jgg, linux-integrity, Paul Menzel,
	Ken Goldman, zohar, why2jjj.linux, Hamza Attak, gregkh, arnd,
	Nayna, James Bottomley

> On Aug 14, 2021, at 3:25 PM, Hao Wu <hao.wu@rubrik.com> wrote:
> 
> The Atmel TPM 1.2 chips crash with error
> `tpm_try_transmit: send(): error -62` since kernel 4.14.
> It is observed from the kernel log after running `tpm_sealdata -z`.
> The error thrown from the command is as follows
> ```
> $ tpm_sealdata -z
> Tspi_Key_LoadKey failed: 0x00001087 - layer=tddl,
> code=0087 (135), I/O error
> ```
> 
> The issue was reproduced with the following Atmel TPM chip:
> ```
> $ tpm_version
> T0  TPM 1.2 Version Info:
>  Chip Version:        1.2.66.1
>  Spec Level:          2
>  Errata Revision:     3
>  TPM Vendor ID:       ATML
>  TPM Version:         01010000
>  Manufacturer Info:   41544d4c
> ```
> 
> The root cause of the issue is due to the TPM calls to msleep()
> were replaced with usleep_range() [1], which reduces
> the actual timeout. Via experiments, it is observed that
> the original msleep(5) actually sleeps for 15ms.
> Because of a known timeout issue in Atmel TPM 1.2 chip,
> the shorter timeout than 15ms can cause the error described above.
> 
> A few further changes in kernel 4.16 [2] and 4.18 [3, 4] further
> reduced the timeout to less than 1ms. With experiments,
> the problematic timeout in the latest kernel is the one
> for `wait_for_tpm_stat`.
> 
> To fix it, the patch reverts the timeout of `wait_for_tpm_stat`
> to 15ms for all Atmel TPM 1.2 chips, but leave it untouched
> for Ateml TPM 2.0 chip, and chips from other vendors.
> As explained above, the chosen 15ms timeout is
> the actual timeout before this issue introduced,
> thus the old value is used here.
> Particularly, TPM_ATML_TIMEOUT_WAIT_STAT_MIN is set to 14700us,
> TPM_ATML_TIMEOUT_WAIT_STAT_MIN is set to 15000us according to
> the existing TPM_TIMEOUT_RANGE_US (300us).
> The fixed has been tested in the system with the affected Atmel chip
> with no issues observed after boot up.
> 
> References:
> [1] 9f3fc7bcddcb tpm: replace msleep() with usleep_range() in TPM
> 1.2/2.0 generic drivers
> [2] cf151a9a44d5 tpm: reduce tpm polling delay in tpm_tis_core
> [3] 59f5a6b07f64 tpm: reduce poll sleep time in tpm_transmit()
> [4] 424eaf910c32 tpm: reduce polling time to usecs for even finer
> granularity
> 
> Fixes: 9f3fc7bcddcb ("tpm: replace msleep() with usleep_range() in TPM 1.2/2.0 generic drivers")
> Link: https://patchwork.kernel.org/project/linux-integrity/patch/20200926223150.109645-1-hao.wu@rubrik.com/
> Signed-off-by: Hao Wu <hao.wu@rubrik.com>
> ---
> v4:
> - Move timeout constants to drivers/char/tpm/tpm_tis_core.h
> - Cleanup unnecessary inline comment
> 
> v3:
> - removes unnecessary condition check in `wait_for_tpm_stat`
> 
> v2:
> - follow the existing way to define two timeouts (min and max)
>  for ATMEL chip, thus keep the exact timeout logic for 
>  non-ATEML chips.
> - limit the timeout increase to only ATMEL TPM 1.2 chips,
>  because it is not an issue for TPM 2.0 chips yet.
> 
> Test Plan:
> - Run fixed kernel with ATMEL TPM chips and see crash
> has been fixed.
> - Run fixed kernel with non-ATMEL TPM chips, and confirm
> the timeout has not been changed.
> 
> drivers/char/tpm/tpm_tis_core.c | 13 +++++++++++--
> drivers/char/tpm/tpm_tis_core.h |  2 ++
> include/linux/tpm.h             |  3 +++
> 3 files changed, 16 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
> index 55b9d3965ae1..24605f100e96 100644
> --- a/drivers/char/tpm/tpm_tis_core.c
> +++ b/drivers/char/tpm/tpm_tis_core.c
> @@ -80,8 +80,8 @@ static int wait_for_tpm_stat(struct tpm_chip *chip, u8 mask,
> 		}
> 	} else {
> 		do {
> -			usleep_range(TPM_TIMEOUT_USECS_MIN,
> -				     TPM_TIMEOUT_USECS_MAX);
> +			usleep_range(chip->timeout_wait_stat_min,
> +				     chip->timeout_wait_stat_max);
> 			status = chip->ops->status(chip);
> 			if ((status & mask) == mask)
> 				return 0;
> @@ -934,6 +934,8 @@ int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq,
> 	chip->timeout_b = msecs_to_jiffies(TIS_TIMEOUT_B_MAX);
> 	chip->timeout_c = msecs_to_jiffies(TIS_TIMEOUT_C_MAX);
> 	chip->timeout_d = msecs_to_jiffies(TIS_TIMEOUT_D_MAX);
> +	chip->timeout_wait_stat_min = TPM_TIMEOUT_USECS_MIN;
> +	chip->timeout_wait_stat_max = TPM_TIMEOUT_USECS_MAX;
> 	priv->phy_ops = phy_ops;
> 	dev_set_drvdata(&chip->dev, priv);
> 
> @@ -983,6 +985,13 @@ int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq,
> 
> 	priv->manufacturer_id = vendor;
> 
> +	if (priv->manufacturer_id == TPM_VID_ATML &&
> +		!(chip->flags & TPM_CHIP_FLAG_TPM2)) {
> +		/* If TPM chip is 1.2 ATMEL chip, timeout need to be relaxed*/
> +		chip->timeout_wait_stat_min = TPM_ATML_TIMEOUT_WAIT_STAT_MIN;
> +		chip->timeout_wait_stat_max = TPM_ATML_TIMEOUT_WAIT_STAT_MAX;
> +	}
> +
> 	rc = tpm_tis_read8(priv, TPM_RID(0), &rid);
> 	if (rc < 0)
> 		goto out_err;
> diff --git a/drivers/char/tpm/tpm_tis_core.h b/drivers/char/tpm/tpm_tis_core.h
> index 9b2d32a59f67..2e431beb44f7 100644
> --- a/drivers/char/tpm/tpm_tis_core.h
> +++ b/drivers/char/tpm/tpm_tis_core.h
> @@ -54,6 +54,8 @@ enum tis_defaults {
> 	TIS_MEM_LEN = 0x5000,
> 	TIS_SHORT_TIMEOUT = 750,	/* ms */
> 	TIS_LONG_TIMEOUT = 2000,	/* 2 sec */
> +	TPM_ATML_TIMEOUT_WAIT_STAT_MIN = 14700,	/* usecs */
> +	TPM_ATML_TIMEOUT_WAIT_STAT_MAX = 15000,	/* usecs */
> };
> 
> /* Some timeout values are needed before it is known whether the chip is
> diff --git a/include/linux/tpm.h b/include/linux/tpm.h
> index aa11fe323c56..171b9102c976 100644
> --- a/include/linux/tpm.h
> +++ b/include/linux/tpm.h
> @@ -150,6 +150,8 @@ struct tpm_chip {
> 	bool timeout_adjusted;
> 	unsigned long duration[TPM_NUM_DURATIONS]; /* jiffies */
> 	bool duration_adjusted;
> +	unsigned int timeout_wait_stat_min; /* usecs */
> +	unsigned int timeout_wait_stat_max; /* usecs */
> 
> 	struct dentry *bios_dir[TPM_NUM_EVENT_LOG_FILES];
> 
> @@ -269,6 +271,7 @@ enum tpm2_cc_attrs {
> #define TPM_VID_INTEL    0x8086
> #define TPM_VID_WINBOND  0x1050
> #define TPM_VID_STM      0x104A
> +#define TPM_VID_ATML     0x1114
> 
> enum tpm_chip_flags {
> 	TPM_CHIP_FLAG_TPM2		= BIT(1),
> -- 
> 2.29.0.vfs.0.0
> 

Just kindly remind this code review in case it has been missed somehow

Thanks
Hao


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v4] tpm: fix Atmel TPM crash caused by too frequent queries
  2021-08-26  5:38           ` Hao Wu
@ 2021-08-26 16:24             ` Jarkko Sakkinen
  2021-08-27  0:35               ` Hao Wu
  0 siblings, 1 reply; 91+ messages in thread
From: Jarkko Sakkinen @ 2021-08-26 16:24 UTC (permalink / raw)
  To: Hao Wu, Shrihari Kalkar, Han Seungyeop, Anish Jhaveri,
	peterhuewe, jgg, linux-integrity, Paul Menzel, Ken Goldman,
	zohar, why2jjj.linux, Hamza Attak, gregkh, arnd, Nayna,
	James Bottomley

On Wed, 2021-08-25 at 22:38 -0700, Hao Wu wrote:
> > On Aug 14, 2021, at 3:25 PM, Hao Wu <hao.wu@rubrik.com> wrote:
> > 
> > The Atmel TPM 1.2 chips crash with error
> > `tpm_try_transmit: send(): error -62` since kernel 4.14.
> > It is observed from the kernel log after running `tpm_sealdata -z`.
> > The error thrown from the command is as follows
> > ```
> > $ tpm_sealdata -z
> > Tspi_Key_LoadKey failed: 0x00001087 - layer=tddl,
> > code=0087 (135), I/O error
> > ```
> > 
> > The issue was reproduced with the following Atmel TPM chip:
> > ```
> > $ tpm_version
> > T0  TPM 1.2 Version Info:
> >  Chip Version:        1.2.66.1
> >  Spec Level:          2
> >  Errata Revision:     3
> >  TPM Vendor ID:       ATML
> >  TPM Version:         01010000
> >  Manufacturer Info:   41544d4c
> > ```
> > 
> > The root cause of the issue is due to the TPM calls to msleep()
> > were replaced with usleep_range() [1], which reduces
> > the actual timeout. Via experiments, it is observed that
> > the original msleep(5) actually sleeps for 15ms.
> > Because of a known timeout issue in Atmel TPM 1.2 chip,
> > the shorter timeout than 15ms can cause the error described above.
> > 
> > A few further changes in kernel 4.16 [2] and 4.18 [3, 4] further
> > reduced the timeout to less than 1ms. With experiments,
> > the problematic timeout in the latest kernel is the one
> > for `wait_for_tpm_stat`.
> > 
> > To fix it, the patch reverts the timeout of `wait_for_tpm_stat`
> > to 15ms for all Atmel TPM 1.2 chips, but leave it untouched
> > for Ateml TPM 2.0 chip, and chips from other vendors.
> > As explained above, the chosen 15ms timeout is
> > the actual timeout before this issue introduced,
> > thus the old value is used here.
> > Particularly, TPM_ATML_TIMEOUT_WAIT_STAT_MIN is set to 14700us,
> > TPM_ATML_TIMEOUT_WAIT_STAT_MIN is set to 15000us according to
> > the existing TPM_TIMEOUT_RANGE_US (300us).
> > The fixed has been tested in the system with the affected Atmel chip
> > with no issues observed after boot up.
> > 
> > References:
> > [1] 9f3fc7bcddcb tpm: replace msleep() with usleep_range() in TPM
> > 1.2/2.0 generic drivers
> > [2] cf151a9a44d5 tpm: reduce tpm polling delay in tpm_tis_core
> > [3] 59f5a6b07f64 tpm: reduce poll sleep time in tpm_transmit()
> > [4] 424eaf910c32 tpm: reduce polling time to usecs for even finer
> > granularity
> > 
> > Fixes: 9f3fc7bcddcb ("tpm: replace msleep() with usleep_range() in TPM 1.2/2.0 generic drivers")
> > Link: https://patchwork.kernel.org/project/linux-integrity/patch/20200926223150.109645-1-hao.wu@rubrik.com/
> > Signed-off-by: Hao Wu <hao.wu@rubrik.com>
> > ---
> > v4:
> > - Move timeout constants to drivers/char/tpm/tpm_tis_core.h
> > - Cleanup unnecessary inline comment
> > 
> > v3:
> > - removes unnecessary condition check in `wait_for_tpm_stat`
> > 
> > v2:
> > - follow the existing way to define two timeouts (min and max)
> >  for ATMEL chip, thus keep the exact timeout logic for 
> >  non-ATEML chips.
> > - limit the timeout increase to only ATMEL TPM 1.2 chips,
> >  because it is not an issue for TPM 2.0 chips yet.
> > 
> > Test Plan:
> > - Run fixed kernel with ATMEL TPM chips and see crash
> > has been fixed.
> > - Run fixed kernel with non-ATMEL TPM chips, and confirm
> > the timeout has not been changed.
> > 
> > drivers/char/tpm/tpm_tis_core.c | 13 +++++++++++--
> > drivers/char/tpm/tpm_tis_core.h |  2 ++
> > include/linux/tpm.h             |  3 +++
> > 3 files changed, 16 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
> > index 55b9d3965ae1..24605f100e96 100644
> > --- a/drivers/char/tpm/tpm_tis_core.c
> > +++ b/drivers/char/tpm/tpm_tis_core.c
> > @@ -80,8 +80,8 @@ static int wait_for_tpm_stat(struct tpm_chip *chip, u8 mask,
> > 		}
> > 	} else {
> > 		do {
> > -			usleep_range(TPM_TIMEOUT_USECS_MIN,
> > -				     TPM_TIMEOUT_USECS_MAX);
> > +			usleep_range(chip->timeout_wait_stat_min,
> > +				     chip->timeout_wait_stat_max);
> > 			status = chip->ops->status(chip);
> > 			if ((status & mask) == mask)
> > 				return 0;
> > @@ -934,6 +934,8 @@ int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq,
> > 	chip->timeout_b = msecs_to_jiffies(TIS_TIMEOUT_B_MAX);
> > 	chip->timeout_c = msecs_to_jiffies(TIS_TIMEOUT_C_MAX);
> > 	chip->timeout_d = msecs_to_jiffies(TIS_TIMEOUT_D_MAX);
> > +	chip->timeout_wait_stat_min = TPM_TIMEOUT_USECS_MIN;
> > +	chip->timeout_wait_stat_max = TPM_TIMEOUT_USECS_MAX;
> > 	priv->phy_ops = phy_ops;
> > 	dev_set_drvdata(&chip->dev, priv);
> > 
> > @@ -983,6 +985,13 @@ int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq,
> > 
> > 	priv->manufacturer_id = vendor;
> > 
> > +	if (priv->manufacturer_id == TPM_VID_ATML &&
> > +		!(chip->flags & TPM_CHIP_FLAG_TPM2)) {
> > +		/* If TPM chip is 1.2 ATMEL chip, timeout need to be relaxed*/
> > +		chip->timeout_wait_stat_min = TPM_ATML_TIMEOUT_WAIT_STAT_MIN;
> > +		chip->timeout_wait_stat_max = TPM_ATML_TIMEOUT_WAIT_STAT_MAX;
> > +	}
> > +
> > 	rc = tpm_tis_read8(priv, TPM_RID(0), &rid);
> > 	if (rc < 0)
> > 		goto out_err;
> > diff --git a/drivers/char/tpm/tpm_tis_core.h b/drivers/char/tpm/tpm_tis_core.h
> > index 9b2d32a59f67..2e431beb44f7 100644
> > --- a/drivers/char/tpm/tpm_tis_core.h
> > +++ b/drivers/char/tpm/tpm_tis_core.h
> > @@ -54,6 +54,8 @@ enum tis_defaults {
> > 	TIS_MEM_LEN = 0x5000,
> > 	TIS_SHORT_TIMEOUT = 750,	/* ms */
> > 	TIS_LONG_TIMEOUT = 2000,	/* 2 sec */
> > +	TPM_ATML_TIMEOUT_WAIT_STAT_MIN = 14700,	/* usecs */
> > +	TPM_ATML_TIMEOUT_WAIT_STAT_MAX = 15000,	/* usecs */
> > };

I'd prefer TIS_TIMEOUT_{MIN, MAX}_ATML. I.e. no "WAIT_STAT" and without "TPM_"
to be consistent with other constants here.

> > 
> > /* Some timeout values are needed before it is known whether the chip is
> > diff --git a/include/linux/tpm.h b/include/linux/tpm.h
> > index aa11fe323c56..171b9102c976 100644
> > --- a/include/linux/tpm.h
> > +++ b/include/linux/tpm.h
> > @@ -150,6 +150,8 @@ struct tpm_chip {
> > 	bool timeout_adjusted;
> > 	unsigned long duration[TPM_NUM_DURATIONS]; /* jiffies */
> > 	bool duration_adjusted;
> > +	unsigned int timeout_wait_stat_min; /* usecs */
> > +	unsigned int timeout_wait_stat_max; /* usecs */

Please rename as timeout_{min, max}.

And I think tpm_chip is wrong place to put them as they are TIS
specific, i.e. they should be in tpm_tis_data.

> > 
> > 	struct dentry *bios_dir[TPM_NUM_EVENT_LOG_FILES];
> > 
> > @@ -269,6 +271,7 @@ enum tpm2_cc_attrs {
> > #define TPM_VID_INTEL    0x8086
> > #define TPM_VID_WINBOND  0x1050
> > #define TPM_VID_STM      0x104A
> > +#define TPM_VID_ATML     0x1114
> > 
> > enum tpm_chip_flags {
> > 	TPM_CHIP_FLAG_TPM2		= BIT(1),
> > -- 
> > 2.29.0.vfs.0.0
> > 
> 
> Just kindly remind this code review in case it has been missed somehow

I'm sorry, my bad. I managed to somehow miss this. Might be because
I've been recently reorganizing my email accounts. And thanks for
pinging so that I spotted it.

> Thanks
> Hao

/Jarkko


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v4] tpm: fix Atmel TPM crash caused by too frequent queries
  2021-08-26 16:24             ` Jarkko Sakkinen
@ 2021-08-27  0:35               ` Hao Wu
  2021-09-04 21:14                 ` Hao Wu
  0 siblings, 1 reply; 91+ messages in thread
From: Hao Wu @ 2021-08-27  0:35 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: Shrihari Kalkar, Han Seungyeop, Anish Jhaveri, peterhuewe, jgg,
	linux-integrity, Paul Menzel, Ken Goldman, zohar, why2jjj.linux,
	Hamza Attak, gregkh, arnd, Nayna, James Bottomley

> On Aug 26, 2021, at 9:24 AM, Jarkko Sakkinen <jarkko@kernel.org> wrote:
> 
> On Wed, 2021-08-25 at 22:38 -0700, Hao Wu wrote:
>>> On Aug 14, 2021, at 3:25 PM, Hao Wu <hao.wu@rubrik.com> wrote:
>>> 
>>> The Atmel TPM 1.2 chips crash with error
>>> `tpm_try_transmit: send(): error -62` since kernel 4.14.
>>> It is observed from the kernel log after running `tpm_sealdata -z`.
>>> The error thrown from the command is as follows
>>> ```
>>> $ tpm_sealdata -z
>>> Tspi_Key_LoadKey failed: 0x00001087 - layer=tddl,
>>> code=0087 (135), I/O error
>>> ```
>>> 
>>> The issue was reproduced with the following Atmel TPM chip:
>>> ```
>>> $ tpm_version
>>> T0  TPM 1.2 Version Info:
>>> Chip Version:        1.2.66.1
>>> Spec Level:          2
>>> Errata Revision:     3
>>> TPM Vendor ID:       ATML
>>> TPM Version:         01010000
>>> Manufacturer Info:   41544d4c
>>> ```
>>> 
>>> The root cause of the issue is due to the TPM calls to msleep()
>>> were replaced with usleep_range() [1], which reduces
>>> the actual timeout. Via experiments, it is observed that
>>> the original msleep(5) actually sleeps for 15ms.
>>> Because of a known timeout issue in Atmel TPM 1.2 chip,
>>> the shorter timeout than 15ms can cause the error described above.
>>> 
>>> A few further changes in kernel 4.16 [2] and 4.18 [3, 4] further
>>> reduced the timeout to less than 1ms. With experiments,
>>> the problematic timeout in the latest kernel is the one
>>> for `wait_for_tpm_stat`.
>>> 
>>> To fix it, the patch reverts the timeout of `wait_for_tpm_stat`
>>> to 15ms for all Atmel TPM 1.2 chips, but leave it untouched
>>> for Ateml TPM 2.0 chip, and chips from other vendors.
>>> As explained above, the chosen 15ms timeout is
>>> the actual timeout before this issue introduced,
>>> thus the old value is used here.
>>> Particularly, TPM_ATML_TIMEOUT_WAIT_STAT_MIN is set to 14700us,
>>> TPM_ATML_TIMEOUT_WAIT_STAT_MIN is set to 15000us according to
>>> the existing TPM_TIMEOUT_RANGE_US (300us).
>>> The fixed has been tested in the system with the affected Atmel chip
>>> with no issues observed after boot up.
>>> 
>>> References:
>>> [1] 9f3fc7bcddcb tpm: replace msleep() with usleep_range() in TPM
>>> 1.2/2.0 generic drivers
>>> [2] cf151a9a44d5 tpm: reduce tpm polling delay in tpm_tis_core
>>> [3] 59f5a6b07f64 tpm: reduce poll sleep time in tpm_transmit()
>>> [4] 424eaf910c32 tpm: reduce polling time to usecs for even finer
>>> granularity
>>> 
>>> Fixes: 9f3fc7bcddcb ("tpm: replace msleep() with usleep_range() in TPM 1.2/2.0 generic drivers")
>>> Link: https://patchwork.kernel.org/project/linux-integrity/patch/20200926223150.109645-1-hao.wu@rubrik.com/
>>> Signed-off-by: Hao Wu <hao.wu@rubrik.com>
>>> ---
>>> v4:
>>> - Move timeout constants to drivers/char/tpm/tpm_tis_core.h
>>> - Cleanup unnecessary inline comment
>>> 
>>> v3:
>>> - removes unnecessary condition check in `wait_for_tpm_stat`
>>> 
>>> v2:
>>> - follow the existing way to define two timeouts (min and max)
>>> for ATMEL chip, thus keep the exact timeout logic for 
>>> non-ATEML chips.
>>> - limit the timeout increase to only ATMEL TPM 1.2 chips,
>>> because it is not an issue for TPM 2.0 chips yet.
>>> 
>>> Test Plan:
>>> - Run fixed kernel with ATMEL TPM chips and see crash
>>> has been fixed.
>>> - Run fixed kernel with non-ATMEL TPM chips, and confirm
>>> the timeout has not been changed.
>>> 
>>> drivers/char/tpm/tpm_tis_core.c | 13 +++++++++++--
>>> drivers/char/tpm/tpm_tis_core.h |  2 ++
>>> include/linux/tpm.h             |  3 +++
>>> 3 files changed, 16 insertions(+), 2 deletions(-)
>>> 
>>> diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
>>> index 55b9d3965ae1..24605f100e96 100644
>>> --- a/drivers/char/tpm/tpm_tis_core.c
>>> +++ b/drivers/char/tpm/tpm_tis_core.c
>>> @@ -80,8 +80,8 @@ static int wait_for_tpm_stat(struct tpm_chip *chip, u8 mask,
>>> 		}
>>> 	} else {
>>> 		do {
>>> -			usleep_range(TPM_TIMEOUT_USECS_MIN,
>>> -				     TPM_TIMEOUT_USECS_MAX);
>>> +			usleep_range(chip->timeout_wait_stat_min,
>>> +				     chip->timeout_wait_stat_max);
>>> 			status = chip->ops->status(chip);
>>> 			if ((status & mask) == mask)
>>> 				return 0;
>>> @@ -934,6 +934,8 @@ int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq,
>>> 	chip->timeout_b = msecs_to_jiffies(TIS_TIMEOUT_B_MAX);
>>> 	chip->timeout_c = msecs_to_jiffies(TIS_TIMEOUT_C_MAX);
>>> 	chip->timeout_d = msecs_to_jiffies(TIS_TIMEOUT_D_MAX);
>>> +	chip->timeout_wait_stat_min = TPM_TIMEOUT_USECS_MIN;
>>> +	chip->timeout_wait_stat_max = TPM_TIMEOUT_USECS_MAX;
>>> 	priv->phy_ops = phy_ops;
>>> 	dev_set_drvdata(&chip->dev, priv);
>>> 
>>> @@ -983,6 +985,13 @@ int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq,
>>> 
>>> 	priv->manufacturer_id = vendor;
>>> 
>>> +	if (priv->manufacturer_id == TPM_VID_ATML &&
>>> +		!(chip->flags & TPM_CHIP_FLAG_TPM2)) {
>>> +		/* If TPM chip is 1.2 ATMEL chip, timeout need to be relaxed*/
>>> +		chip->timeout_wait_stat_min = TPM_ATML_TIMEOUT_WAIT_STAT_MIN;
>>> +		chip->timeout_wait_stat_max = TPM_ATML_TIMEOUT_WAIT_STAT_MAX;
>>> +	}
>>> +
>>> 	rc = tpm_tis_read8(priv, TPM_RID(0), &rid);
>>> 	if (rc < 0)
>>> 		goto out_err;
>>> diff --git a/drivers/char/tpm/tpm_tis_core.h b/drivers/char/tpm/tpm_tis_core.h
>>> index 9b2d32a59f67..2e431beb44f7 100644
>>> --- a/drivers/char/tpm/tpm_tis_core.h
>>> +++ b/drivers/char/tpm/tpm_tis_core.h
>>> @@ -54,6 +54,8 @@ enum tis_defaults {
>>> 	TIS_MEM_LEN = 0x5000,
>>> 	TIS_SHORT_TIMEOUT = 750,	/* ms */
>>> 	TIS_LONG_TIMEOUT = 2000,	/* 2 sec */
>>> +	TPM_ATML_TIMEOUT_WAIT_STAT_MIN = 14700,	/* usecs */
>>> +	TPM_ATML_TIMEOUT_WAIT_STAT_MAX = 15000,	/* usecs */
>>> };
> 
> I'd prefer TIS_TIMEOUT_{MIN, MAX}_ATML. I.e. no "WAIT_STAT" and without "TPM_"
> to be consistent with other constants here.
Ok will do
> 
>>> 
>>> /* Some timeout values are needed before it is known whether the chip is
>>> diff --git a/include/linux/tpm.h b/include/linux/tpm.h
>>> index aa11fe323c56..171b9102c976 100644
>>> --- a/include/linux/tpm.h
>>> +++ b/include/linux/tpm.h
>>> @@ -150,6 +150,8 @@ struct tpm_chip {
>>> 	bool timeout_adjusted;
>>> 	unsigned long duration[TPM_NUM_DURATIONS]; /* jiffies */
>>> 	bool duration_adjusted;
>>> +	unsigned int timeout_wait_stat_min; /* usecs */
>>> +	unsigned int timeout_wait_stat_max; /* usecs */
> 
> Please rename as timeout_{min, max}.
Ok will do
> 
> And I think tpm_chip is wrong place to put them as they are TIS
> specific, i.e. they should be in tpm_tis_data.
Sorry, I am not familiar with tpm_tis_data, could tell the the place that you want me to put the var? 
I think I may have hard time to move forward according toward this comment due to bandwidth constraints.
Some helps would be appreciated. 

Is tpm_tis_data something specific to a chip instance ? Given the values are tied to chip,
we need chip specific instance to make this work.

> 
>>> 
>>> 	struct dentry *bios_dir[TPM_NUM_EVENT_LOG_FILES];
>>> 
>>> @@ -269,6 +271,7 @@ enum tpm2_cc_attrs {
>>> #define TPM_VID_INTEL    0x8086
>>> #define TPM_VID_WINBOND  0x1050
>>> #define TPM_VID_STM      0x104A
>>> +#define TPM_VID_ATML     0x1114
>>> 
>>> enum tpm_chip_flags {
>>> 	TPM_CHIP_FLAG_TPM2		= BIT(1),
>>> -- 
>>> 2.29.0.vfs.0.0
>>> 
>> 
>> Just kindly remind this code review in case it has been missed somehow
> 
> I'm sorry, my bad. I managed to somehow miss this. Might be because
> I've been recently reorganizing my email accounts. And thanks for
> pinging so that I spotted it.
No worries, thanks for quick response!

> 
>> Thanks
>> Hao
> 
> /Jarkko

Hao

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v4] tpm: fix Atmel TPM crash caused by too frequent queries
  2021-08-27  0:35               ` Hao Wu
@ 2021-09-04 21:14                 ` Hao Wu
  2021-09-04 23:15                   ` Hao Wu
  0 siblings, 1 reply; 91+ messages in thread
From: Hao Wu @ 2021-09-04 21:14 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: Shrihari Kalkar, Han Seungyeop, Anish Jhaveri, peterhuewe, jgg,
	linux-integrity, Paul Menzel, Ken Goldman, zohar, why2jjj.linux,
	Hamza Attak, gregkh, arnd, Nayna, James Bottomley



> On Aug 26, 2021, at 5:35 PM, Hao Wu <hao.wu@rubrik.com> wrote:
> 
>> On Aug 26, 2021, at 9:24 AM, Jarkko Sakkinen <jarkko@kernel.org> wrote:
>> 
>> On Wed, 2021-08-25 at 22:38 -0700, Hao Wu wrote:
>>>> On Aug 14, 2021, at 3:25 PM, Hao Wu <hao.wu@rubrik.com> wrote:
>>>> 
>>>> The Atmel TPM 1.2 chips crash with error
>>>> `tpm_try_transmit: send(): error -62` since kernel 4.14.
>>>> It is observed from the kernel log after running `tpm_sealdata -z`.
>>>> The error thrown from the command is as follows
>>>> ```
>>>> $ tpm_sealdata -z
>>>> Tspi_Key_LoadKey failed: 0x00001087 - layer=tddl,
>>>> code=0087 (135), I/O error
>>>> ```
>>>> 
>>>> The issue was reproduced with the following Atmel TPM chip:
>>>> ```
>>>> $ tpm_version
>>>> T0  TPM 1.2 Version Info:
>>>> Chip Version:        1.2.66.1
>>>> Spec Level:          2
>>>> Errata Revision:     3
>>>> TPM Vendor ID:       ATML
>>>> TPM Version:         01010000
>>>> Manufacturer Info:   41544d4c
>>>> ```
>>>> 
>>>> The root cause of the issue is due to the TPM calls to msleep()
>>>> were replaced with usleep_range() [1], which reduces
>>>> the actual timeout. Via experiments, it is observed that
>>>> the original msleep(5) actually sleeps for 15ms.
>>>> Because of a known timeout issue in Atmel TPM 1.2 chip,
>>>> the shorter timeout than 15ms can cause the error described above.
>>>> 
>>>> A few further changes in kernel 4.16 [2] and 4.18 [3, 4] further
>>>> reduced the timeout to less than 1ms. With experiments,
>>>> the problematic timeout in the latest kernel is the one
>>>> for `wait_for_tpm_stat`.
>>>> 
>>>> To fix it, the patch reverts the timeout of `wait_for_tpm_stat`
>>>> to 15ms for all Atmel TPM 1.2 chips, but leave it untouched
>>>> for Ateml TPM 2.0 chip, and chips from other vendors.
>>>> As explained above, the chosen 15ms timeout is
>>>> the actual timeout before this issue introduced,
>>>> thus the old value is used here.
>>>> Particularly, TPM_ATML_TIMEOUT_WAIT_STAT_MIN is set to 14700us,
>>>> TPM_ATML_TIMEOUT_WAIT_STAT_MIN is set to 15000us according to
>>>> the existing TPM_TIMEOUT_RANGE_US (300us).
>>>> The fixed has been tested in the system with the affected Atmel chip
>>>> with no issues observed after boot up.
>>>> 
>>>> References:
>>>> [1] 9f3fc7bcddcb tpm: replace msleep() with usleep_range() in TPM
>>>> 1.2/2.0 generic drivers
>>>> [2] cf151a9a44d5 tpm: reduce tpm polling delay in tpm_tis_core
>>>> [3] 59f5a6b07f64 tpm: reduce poll sleep time in tpm_transmit()
>>>> [4] 424eaf910c32 tpm: reduce polling time to usecs for even finer
>>>> granularity
>>>> 
>>>> Fixes: 9f3fc7bcddcb ("tpm: replace msleep() with usleep_range() in TPM 1.2/2.0 generic drivers")
>>>> Link: https://patchwork.kernel.org/project/linux-integrity/patch/20200926223150.109645-1-hao.wu@rubrik.com/
>>>> Signed-off-by: Hao Wu <hao.wu@rubrik.com>
>>>> ---
>>>> v4:
>>>> - Move timeout constants to drivers/char/tpm/tpm_tis_core.h
>>>> - Cleanup unnecessary inline comment
>>>> 
>>>> v3:
>>>> - removes unnecessary condition check in `wait_for_tpm_stat`
>>>> 
>>>> v2:
>>>> - follow the existing way to define two timeouts (min and max)
>>>> for ATMEL chip, thus keep the exact timeout logic for 
>>>> non-ATEML chips.
>>>> - limit the timeout increase to only ATMEL TPM 1.2 chips,
>>>> because it is not an issue for TPM 2.0 chips yet.
>>>> 
>>>> Test Plan:
>>>> - Run fixed kernel with ATMEL TPM chips and see crash
>>>> has been fixed.
>>>> - Run fixed kernel with non-ATMEL TPM chips, and confirm
>>>> the timeout has not been changed.
>>>> 
>>>> drivers/char/tpm/tpm_tis_core.c | 13 +++++++++++--
>>>> drivers/char/tpm/tpm_tis_core.h |  2 ++
>>>> include/linux/tpm.h             |  3 +++
>>>> 3 files changed, 16 insertions(+), 2 deletions(-)
>>>> 
>>>> diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
>>>> index 55b9d3965ae1..24605f100e96 100644
>>>> --- a/drivers/char/tpm/tpm_tis_core.c
>>>> +++ b/drivers/char/tpm/tpm_tis_core.c
>>>> @@ -80,8 +80,8 @@ static int wait_for_tpm_stat(struct tpm_chip *chip, u8 mask,
>>>> 		}
>>>> 	} else {
>>>> 		do {
>>>> -			usleep_range(TPM_TIMEOUT_USECS_MIN,
>>>> -				     TPM_TIMEOUT_USECS_MAX);
>>>> +			usleep_range(chip->timeout_wait_stat_min,
>>>> +				     chip->timeout_wait_stat_max);
>>>> 			status = chip->ops->status(chip);
>>>> 			if ((status & mask) == mask)
>>>> 				return 0;
>>>> @@ -934,6 +934,8 @@ int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq,
>>>> 	chip->timeout_b = msecs_to_jiffies(TIS_TIMEOUT_B_MAX);
>>>> 	chip->timeout_c = msecs_to_jiffies(TIS_TIMEOUT_C_MAX);
>>>> 	chip->timeout_d = msecs_to_jiffies(TIS_TIMEOUT_D_MAX);
>>>> +	chip->timeout_wait_stat_min = TPM_TIMEOUT_USECS_MIN;
>>>> +	chip->timeout_wait_stat_max = TPM_TIMEOUT_USECS_MAX;
>>>> 	priv->phy_ops = phy_ops;
>>>> 	dev_set_drvdata(&chip->dev, priv);
>>>> 
>>>> @@ -983,6 +985,13 @@ int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq,
>>>> 
>>>> 	priv->manufacturer_id = vendor;
>>>> 
>>>> +	if (priv->manufacturer_id == TPM_VID_ATML &&
>>>> +		!(chip->flags & TPM_CHIP_FLAG_TPM2)) {
>>>> +		/* If TPM chip is 1.2 ATMEL chip, timeout need to be relaxed*/
>>>> +		chip->timeout_wait_stat_min = TPM_ATML_TIMEOUT_WAIT_STAT_MIN;
>>>> +		chip->timeout_wait_stat_max = TPM_ATML_TIMEOUT_WAIT_STAT_MAX;
>>>> +	}
>>>> +
>>>> 	rc = tpm_tis_read8(priv, TPM_RID(0), &rid);
>>>> 	if (rc < 0)
>>>> 		goto out_err;
>>>> diff --git a/drivers/char/tpm/tpm_tis_core.h b/drivers/char/tpm/tpm_tis_core.h
>>>> index 9b2d32a59f67..2e431beb44f7 100644
>>>> --- a/drivers/char/tpm/tpm_tis_core.h
>>>> +++ b/drivers/char/tpm/tpm_tis_core.h
>>>> @@ -54,6 +54,8 @@ enum tis_defaults {
>>>> 	TIS_MEM_LEN = 0x5000,
>>>> 	TIS_SHORT_TIMEOUT = 750,	/* ms */
>>>> 	TIS_LONG_TIMEOUT = 2000,	/* 2 sec */
>>>> +	TPM_ATML_TIMEOUT_WAIT_STAT_MIN = 14700,	/* usecs */
>>>> +	TPM_ATML_TIMEOUT_WAIT_STAT_MAX = 15000,	/* usecs */
>>>> };
>> 
>> I'd prefer TIS_TIMEOUT_{MIN, MAX}_ATML. I.e. no "WAIT_STAT" and without "TPM_"
>> to be consistent with other constants here.
> Ok will do
>> 
>>>> 
>>>> /* Some timeout values are needed before it is known whether the chip is
>>>> diff --git a/include/linux/tpm.h b/include/linux/tpm.h
>>>> index aa11fe323c56..171b9102c976 100644
>>>> --- a/include/linux/tpm.h
>>>> +++ b/include/linux/tpm.h
>>>> @@ -150,6 +150,8 @@ struct tpm_chip {
>>>> 	bool timeout_adjusted;
>>>> 	unsigned long duration[TPM_NUM_DURATIONS]; /* jiffies */
>>>> 	bool duration_adjusted;
>>>> +	unsigned int timeout_wait_stat_min; /* usecs */
>>>> +	unsigned int timeout_wait_stat_max; /* usecs */
>> 
>> Please rename as timeout_{min, max}.
> Ok will do
>> 
>> And I think tpm_chip is wrong place to put them as they are TIS
>> specific, i.e. they should be in tpm_tis_data.
> Sorry, I am not familiar with tpm_tis_data, could tell the the place that you want me to put the var? 
> I think I may have hard time to move forward according toward this comment due to bandwidth constraints.
> Some helps would be appreciated. 
> 
> Is tpm_tis_data something specific to a chip instance ? Given the values are tied to chip,
> we need chip specific instance to make this work.

Hi Jarkko, I have checked about your proposal a bit. It look slike we need to 
Run “struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev)” in every wait_for_tpm_stat call. Would this be a performance concern ? 
If we cache this in tpm_chip instance, it is not the case. 

Please let me know your thought.

Hao 

>> 
>>>> 
>>>> 	struct dentry *bios_dir[TPM_NUM_EVENT_LOG_FILES];
>>>> 
>>>> @@ -269,6 +271,7 @@ enum tpm2_cc_attrs {
>>>> #define TPM_VID_INTEL    0x8086
>>>> #define TPM_VID_WINBOND  0x1050
>>>> #define TPM_VID_STM      0x104A
>>>> +#define TPM_VID_ATML     0x1114
>>>> 
>>>> enum tpm_chip_flags {
>>>> 	TPM_CHIP_FLAG_TPM2		= BIT(1),
>>>> -- 
>>>> 2.29.0.vfs.0.0
>>>> 
>>> 
>>> Just kindly remind this code review in case it has been missed somehow
>> 
>> I'm sorry, my bad. I managed to somehow miss this. Might be because
>> I've been recently reorganizing my email accounts. And thanks for
>> pinging so that I spotted it.
> No worries, thanks for quick response!
> 
>> 
>>> Thanks
>>> Hao
>> 
>> /Jarkko
> 
> Hao


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v4] tpm: fix Atmel TPM crash caused by too frequent queries
  2021-09-04 21:14                 ` Hao Wu
@ 2021-09-04 23:15                   ` Hao Wu
  0 siblings, 0 replies; 91+ messages in thread
From: Hao Wu @ 2021-09-04 23:15 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: Shrihari Kalkar, Han Seungyeop, Anish Jhaveri, peterhuewe, jgg,
	linux-integrity, Paul Menzel, Ken Goldman, zohar, why2jjj.linux,
	Hamza Attak, gregkh, arnd, Nayna, James Bottomley



> On Sep 4, 2021, at 2:14 PM, Hao Wu <hao.wu@rubrik.com> wrote:
> 
> 
> 
>> On Aug 26, 2021, at 5:35 PM, Hao Wu <hao.wu@rubrik.com> wrote:
>> 
>>> On Aug 26, 2021, at 9:24 AM, Jarkko Sakkinen <jarkko@kernel.org> wrote:
>>> 
>>> On Wed, 2021-08-25 at 22:38 -0700, Hao Wu wrote:
>>>>> On Aug 14, 2021, at 3:25 PM, Hao Wu <hao.wu@rubrik.com> wrote:
>>>>> 
>>>>> The Atmel TPM 1.2 chips crash with error
>>>>> `tpm_try_transmit: send(): error -62` since kernel 4.14.
>>>>> It is observed from the kernel log after running `tpm_sealdata -z`.
>>>>> The error thrown from the command is as follows
>>>>> ```
>>>>> $ tpm_sealdata -z
>>>>> Tspi_Key_LoadKey failed: 0x00001087 - layer=tddl,
>>>>> code=0087 (135), I/O error
>>>>> ```
>>>>> 
>>>>> The issue was reproduced with the following Atmel TPM chip:
>>>>> ```
>>>>> $ tpm_version
>>>>> T0  TPM 1.2 Version Info:
>>>>> Chip Version:        1.2.66.1
>>>>> Spec Level:          2
>>>>> Errata Revision:     3
>>>>> TPM Vendor ID:       ATML
>>>>> TPM Version:         01010000
>>>>> Manufacturer Info:   41544d4c
>>>>> ```
>>>>> 
>>>>> The root cause of the issue is due to the TPM calls to msleep()
>>>>> were replaced with usleep_range() [1], which reduces
>>>>> the actual timeout. Via experiments, it is observed that
>>>>> the original msleep(5) actually sleeps for 15ms.
>>>>> Because of a known timeout issue in Atmel TPM 1.2 chip,
>>>>> the shorter timeout than 15ms can cause the error described above.
>>>>> 
>>>>> A few further changes in kernel 4.16 [2] and 4.18 [3, 4] further
>>>>> reduced the timeout to less than 1ms. With experiments,
>>>>> the problematic timeout in the latest kernel is the one
>>>>> for `wait_for_tpm_stat`.
>>>>> 
>>>>> To fix it, the patch reverts the timeout of `wait_for_tpm_stat`
>>>>> to 15ms for all Atmel TPM 1.2 chips, but leave it untouched
>>>>> for Ateml TPM 2.0 chip, and chips from other vendors.
>>>>> As explained above, the chosen 15ms timeout is
>>>>> the actual timeout before this issue introduced,
>>>>> thus the old value is used here.
>>>>> Particularly, TPM_ATML_TIMEOUT_WAIT_STAT_MIN is set to 14700us,
>>>>> TPM_ATML_TIMEOUT_WAIT_STAT_MIN is set to 15000us according to
>>>>> the existing TPM_TIMEOUT_RANGE_US (300us).
>>>>> The fixed has been tested in the system with the affected Atmel chip
>>>>> with no issues observed after boot up.
>>>>> 
>>>>> References:
>>>>> [1] 9f3fc7bcddcb tpm: replace msleep() with usleep_range() in TPM
>>>>> 1.2/2.0 generic drivers
>>>>> [2] cf151a9a44d5 tpm: reduce tpm polling delay in tpm_tis_core
>>>>> [3] 59f5a6b07f64 tpm: reduce poll sleep time in tpm_transmit()
>>>>> [4] 424eaf910c32 tpm: reduce polling time to usecs for even finer
>>>>> granularity
>>>>> 
>>>>> Fixes: 9f3fc7bcddcb ("tpm: replace msleep() with usleep_range() in TPM 1.2/2.0 generic drivers")
>>>>> Link: https://patchwork.kernel.org/project/linux-integrity/patch/20200926223150.109645-1-hao.wu@rubrik.com/
>>>>> Signed-off-by: Hao Wu <hao.wu@rubrik.com>
>>>>> ---
>>>>> v4:
>>>>> - Move timeout constants to drivers/char/tpm/tpm_tis_core.h
>>>>> - Cleanup unnecessary inline comment
>>>>> 
>>>>> v3:
>>>>> - removes unnecessary condition check in `wait_for_tpm_stat`
>>>>> 
>>>>> v2:
>>>>> - follow the existing way to define two timeouts (min and max)
>>>>> for ATMEL chip, thus keep the exact timeout logic for 
>>>>> non-ATEML chips.
>>>>> - limit the timeout increase to only ATMEL TPM 1.2 chips,
>>>>> because it is not an issue for TPM 2.0 chips yet.
>>>>> 
>>>>> Test Plan:
>>>>> - Run fixed kernel with ATMEL TPM chips and see crash
>>>>> has been fixed.
>>>>> - Run fixed kernel with non-ATMEL TPM chips, and confirm
>>>>> the timeout has not been changed.
>>>>> 
>>>>> drivers/char/tpm/tpm_tis_core.c | 13 +++++++++++--
>>>>> drivers/char/tpm/tpm_tis_core.h |  2 ++
>>>>> include/linux/tpm.h             |  3 +++
>>>>> 3 files changed, 16 insertions(+), 2 deletions(-)
>>>>> 
>>>>> diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
>>>>> index 55b9d3965ae1..24605f100e96 100644
>>>>> --- a/drivers/char/tpm/tpm_tis_core.c
>>>>> +++ b/drivers/char/tpm/tpm_tis_core.c
>>>>> @@ -80,8 +80,8 @@ static int wait_for_tpm_stat(struct tpm_chip *chip, u8 mask,
>>>>> 		}
>>>>> 	} else {
>>>>> 		do {
>>>>> -			usleep_range(TPM_TIMEOUT_USECS_MIN,
>>>>> -				     TPM_TIMEOUT_USECS_MAX);
>>>>> +			usleep_range(chip->timeout_wait_stat_min,
>>>>> +				     chip->timeout_wait_stat_max);
>>>>> 			status = chip->ops->status(chip);
>>>>> 			if ((status & mask) == mask)
>>>>> 				return 0;
>>>>> @@ -934,6 +934,8 @@ int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq,
>>>>> 	chip->timeout_b = msecs_to_jiffies(TIS_TIMEOUT_B_MAX);
>>>>> 	chip->timeout_c = msecs_to_jiffies(TIS_TIMEOUT_C_MAX);
>>>>> 	chip->timeout_d = msecs_to_jiffies(TIS_TIMEOUT_D_MAX);
>>>>> +	chip->timeout_wait_stat_min = TPM_TIMEOUT_USECS_MIN;
>>>>> +	chip->timeout_wait_stat_max = TPM_TIMEOUT_USECS_MAX;
>>>>> 	priv->phy_ops = phy_ops;
>>>>> 	dev_set_drvdata(&chip->dev, priv);
>>>>> 
>>>>> @@ -983,6 +985,13 @@ int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq,
>>>>> 
>>>>> 	priv->manufacturer_id = vendor;
>>>>> 
>>>>> +	if (priv->manufacturer_id == TPM_VID_ATML &&
>>>>> +		!(chip->flags & TPM_CHIP_FLAG_TPM2)) {
>>>>> +		/* If TPM chip is 1.2 ATMEL chip, timeout need to be relaxed*/
>>>>> +		chip->timeout_wait_stat_min = TPM_ATML_TIMEOUT_WAIT_STAT_MIN;
>>>>> +		chip->timeout_wait_stat_max = TPM_ATML_TIMEOUT_WAIT_STAT_MAX;
>>>>> +	}
>>>>> +
>>>>> 	rc = tpm_tis_read8(priv, TPM_RID(0), &rid);
>>>>> 	if (rc < 0)
>>>>> 		goto out_err;
>>>>> diff --git a/drivers/char/tpm/tpm_tis_core.h b/drivers/char/tpm/tpm_tis_core.h
>>>>> index 9b2d32a59f67..2e431beb44f7 100644
>>>>> --- a/drivers/char/tpm/tpm_tis_core.h
>>>>> +++ b/drivers/char/tpm/tpm_tis_core.h
>>>>> @@ -54,6 +54,8 @@ enum tis_defaults {
>>>>> 	TIS_MEM_LEN = 0x5000,
>>>>> 	TIS_SHORT_TIMEOUT = 750,	/* ms */
>>>>> 	TIS_LONG_TIMEOUT = 2000,	/* 2 sec */
>>>>> +	TPM_ATML_TIMEOUT_WAIT_STAT_MIN = 14700,	/* usecs */
>>>>> +	TPM_ATML_TIMEOUT_WAIT_STAT_MAX = 15000,	/* usecs */
>>>>> };
>>> 
>>> I'd prefer TIS_TIMEOUT_{MIN, MAX}_ATML. I.e. no "WAIT_STAT" and without "TPM_"
>>> to be consistent with other constants here.
>> Ok will do
>>> 
>>>>> 
>>>>> /* Some timeout values are needed before it is known whether the chip is
>>>>> diff --git a/include/linux/tpm.h b/include/linux/tpm.h
>>>>> index aa11fe323c56..171b9102c976 100644
>>>>> --- a/include/linux/tpm.h
>>>>> +++ b/include/linux/tpm.h
>>>>> @@ -150,6 +150,8 @@ struct tpm_chip {
>>>>> 	bool timeout_adjusted;
>>>>> 	unsigned long duration[TPM_NUM_DURATIONS]; /* jiffies */
>>>>> 	bool duration_adjusted;
>>>>> +	unsigned int timeout_wait_stat_min; /* usecs */
>>>>> +	unsigned int timeout_wait_stat_max; /* usecs */
>>> 
>>> Please rename as timeout_{min, max}.
>> Ok will do
To be honest, this naming could be misleading, because the timeout here only applies to wait_stat use case. 
But I will just follow you suggestion anyway.

Hao

>>> 
>>> And I think tpm_chip is wrong place to put them as they are TIS
>>> specific, i.e. they should be in tpm_tis_data.
>> Sorry, I am not familiar with tpm_tis_data, could tell the the place that you want me to put the var? 
>> I think I may have hard time to move forward according toward this comment due to bandwidth constraints.
>> Some helps would be appreciated. 
>> 
>> Is tpm_tis_data something specific to a chip instance ? Given the values are tied to chip,
>> we need chip specific instance to make this work.
> 
> Hi Jarkko, I have checked about your proposal a bit. It look slike we need to 
> Run “struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev)” in every wait_for_tpm_stat call. Would this be a performance concern ? 
> If we cache this in tpm_chip instance, it is not the case. 
> 
> Please let me know your thought.
> 
> Hao 
> 
>>> 
>>>>> 
>>>>> 	struct dentry *bios_dir[TPM_NUM_EVENT_LOG_FILES];
>>>>> 
>>>>> @@ -269,6 +271,7 @@ enum tpm2_cc_attrs {
>>>>> #define TPM_VID_INTEL    0x8086
>>>>> #define TPM_VID_WINBOND  0x1050
>>>>> #define TPM_VID_STM      0x104A
>>>>> +#define TPM_VID_ATML     0x1114
>>>>> 
>>>>> enum tpm_chip_flags {
>>>>> 	TPM_CHIP_FLAG_TPM2		= BIT(1),
>>>>> -- 
>>>>> 2.29.0.vfs.0.0
>>>>> 
>>>> 
>>>> Just kindly remind this code review in case it has been missed somehow
>>> 
>>> I'm sorry, my bad. I managed to somehow miss this. Might be because
>>> I've been recently reorganizing my email accounts. And thanks for
>>> pinging so that I spotted it.
>> No worries, thanks for quick response!
>> 
>>> 
>>>> Thanks
>>>> Hao
>>> 
>>> /Jarkko
>> 
>> Hao
> 


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v5] tpm: fix Atmel TPM crash caused by too frequent queries
  2021-08-14 22:25         ` [PATCH v4] " Hao Wu
  2021-08-26  5:38           ` Hao Wu
@ 2021-09-05  3:51           ` Hao Wu
  2021-09-07 17:43             ` Jarkko Sakkinen
  1 sibling, 1 reply; 91+ messages in thread
From: Hao Wu @ 2021-09-05  3:51 UTC (permalink / raw)
  To: hao.wu, shrihari.kalkar, seungyeop.han, anish.jhaveri,
	peterhuewe, jarkko, jgg, linux-integrity, pmenzel, kgold, zohar,
	why2jjj.linux, hamza, gregkh, arnd, nayna, James.Bottomley

The Atmel TPM 1.2 chips crash with error
`tpm_try_transmit: send(): error -62` since kernel 4.14.
It is observed from the kernel log after running `tpm_sealdata -z`.
The error thrown from the command is as follows
```
$ tpm_sealdata -z
Tspi_Key_LoadKey failed: 0x00001087 - layer=tddl,
code=0087 (135), I/O error
```

The issue was reproduced with the following Atmel TPM chip:
```
$ tpm_version
T0  TPM 1.2 Version Info:
  Chip Version:        1.2.66.1
  Spec Level:          2
  Errata Revision:     3
  TPM Vendor ID:       ATML
  TPM Version:         01010000
  Manufacturer Info:   41544d4c
```

The root cause of the issue is due to the TPM calls to msleep()
were replaced with usleep_range() [1], which reduces
the actual timeout. Via experiments, it is observed that
the original msleep(5) actually sleeps for 15ms.
Because of a known timeout issue in Atmel TPM 1.2 chip,
the shorter timeout than 15ms can cause the error described above.

A few further changes in kernel 4.16 [2] and 4.18 [3, 4] further
reduced the timeout to less than 1ms. With experiments,
the problematic timeout in the latest kernel is the one
for `wait_for_tpm_stat`.

To fix it, the patch reverts the timeout of `wait_for_tpm_stat`
to 15ms for all Atmel TPM 1.2 chips, but leave it untouched
for Ateml TPM 2.0 chip, and chips from other vendors.
As explained above, the chosen 15ms timeout is
the actual timeout before this issue introduced,
thus the old value is used here.
Particularly, TPM_ATML_TIMEOUT_WAIT_STAT_MIN is set to 14700us,
TPM_ATML_TIMEOUT_WAIT_STAT_MIN is set to 15000us according to
the existing TPM_TIMEOUT_RANGE_US (300us).
The fixed has been tested in the system with the affected Atmel chip
with no issues observed after boot up.

References:
[1] 9f3fc7bcddcb tpm: replace msleep() with usleep_range() in TPM
1.2/2.0 generic drivers
[2] cf151a9a44d5 tpm: reduce tpm polling delay in tpm_tis_core
[3] 59f5a6b07f64 tpm: reduce poll sleep time in tpm_transmit()
[4] 424eaf910c32 tpm: reduce polling time to usecs for even finer
granularity

Fixes: 9f3fc7bcddcb ("tpm: replace msleep() with usleep_range() in TPM 1.2/2.0 generic drivers")
Link: https://patchwork.kernel.org/project/linux-integrity/patch/20200926223150.109645-1-hao.wu@rubrik.com/
Signed-off-by: Hao Wu <hao.wu@rubrik.com>
---
v5:
- Rename variables according to feedbacks
- Move timeout min/max to tpm_tis_data

v4:
- Move timeout constants to drivers/char/tpm/tpm_tis_core.h
- Cleanup unnecessary inline comment

v3:
- removes unnecessary condition check in `wait_for_tpm_stat`

v2:
- follow the existing way to define two timeouts (min and max)
  for ATMEL chip, thus keep the exact timeout logic for 
  non-ATEML chips.
- limit the timeout increase to only ATMEL TPM 1.2 chips,
  because it is not an issue for TPM 2.0 chips yet.

Test Plan:
- Run fixed kernel with ATMEL TPM chips and see crash
has been fixed.
- Run fixed kernel with non-ATMEL TPM chips, and confirm
the timeout has not been changed.

 drivers/char/tpm/tpm_tis_core.c | 27 +++++++++++++++++++--------
 drivers/char/tpm/tpm_tis_core.h |  4 ++++
 include/linux/tpm.h             |  1 +
 3 files changed, 24 insertions(+), 8 deletions(-)

diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
index 55b9d3965ae1..29de383aec5f 100644
--- a/drivers/char/tpm/tpm_tis_core.c
+++ b/drivers/char/tpm/tpm_tis_core.c
@@ -79,9 +79,10 @@ static int wait_for_tpm_stat(struct tpm_chip *chip, u8 mask,
 			goto again;
 		}
 	} else {
+		struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev);
 		do {
-			usleep_range(TPM_TIMEOUT_USECS_MIN,
-				     TPM_TIMEOUT_USECS_MAX);
+			usleep_range(priv->timeout_min,
+				     priv->timeout_max);
 			status = chip->ops->status(chip);
 			if ((status & mask) == mask)
 				return 0;
@@ -934,7 +935,23 @@ int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq,
 	chip->timeout_b = msecs_to_jiffies(TIS_TIMEOUT_B_MAX);
 	chip->timeout_c = msecs_to_jiffies(TIS_TIMEOUT_C_MAX);
 	chip->timeout_d = msecs_to_jiffies(TIS_TIMEOUT_D_MAX);
+	priv->timeout_min = TPM_TIMEOUT_USECS_MIN;
+	priv->timeout_max = TPM_TIMEOUT_USECS_MAX;
 	priv->phy_ops = phy_ops;
+
+	rc = tpm_tis_read32(priv, TPM_DID_VID(0), &vendor);
+	if (rc < 0)
+		goto out_err;
+
+	priv->manufacturer_id = vendor;
+
+	if (priv->manufacturer_id == TPM_VID_ATML &&
+		!(chip->flags & TPM_CHIP_FLAG_TPM2)) {
+		/* If TPM chip is 1.2 ATMEL chip, timeout need to be relaxed*/
+		priv->timeout_min = TIS_TIMEOUT_MIN_ATML;
+		priv->timeout_max = TIS_TIMEOUT_MAX_ATML;
+	}
+
 	dev_set_drvdata(&chip->dev, priv);
 
 	if (is_bsw()) {
@@ -977,12 +994,6 @@ int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq,
 	if (rc)
 		goto out_err;
 
-	rc = tpm_tis_read32(priv, TPM_DID_VID(0), &vendor);
-	if (rc < 0)
-		goto out_err;
-
-	priv->manufacturer_id = vendor;
-
 	rc = tpm_tis_read8(priv, TPM_RID(0), &rid);
 	if (rc < 0)
 		goto out_err;
diff --git a/drivers/char/tpm/tpm_tis_core.h b/drivers/char/tpm/tpm_tis_core.h
index 9b2d32a59f67..c33f27c929f4 100644
--- a/drivers/char/tpm/tpm_tis_core.h
+++ b/drivers/char/tpm/tpm_tis_core.h
@@ -54,6 +54,8 @@ enum tis_defaults {
 	TIS_MEM_LEN = 0x5000,
 	TIS_SHORT_TIMEOUT = 750,	/* ms */
 	TIS_LONG_TIMEOUT = 2000,	/* 2 sec */
+	TIS_TIMEOUT_MIN_ATML = 14700,	/* usecs */
+	TIS_TIMEOUT_MAX_ATML = 15000,	/* usecs */
 };
 
 /* Some timeout values are needed before it is known whether the chip is
@@ -97,6 +99,8 @@ struct tpm_tis_data {
 	wait_queue_head_t read_queue;
 	const struct tpm_tis_phy_ops *phy_ops;
 	unsigned short rng_quality;
+	unsigned int timeout_min; /* usecs */
+	unsigned int timeout_max; /* usecs */
 };
 
 struct tpm_tis_phy_ops {
diff --git a/include/linux/tpm.h b/include/linux/tpm.h
index aa11fe323c56..12d827734686 100644
--- a/include/linux/tpm.h
+++ b/include/linux/tpm.h
@@ -269,6 +269,7 @@ enum tpm2_cc_attrs {
 #define TPM_VID_INTEL    0x8086
 #define TPM_VID_WINBOND  0x1050
 #define TPM_VID_STM      0x104A
+#define TPM_VID_ATML     0x1114
 
 enum tpm_chip_flags {
 	TPM_CHIP_FLAG_TPM2		= BIT(1),
-- 
2.29.0.vfs.0.0


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v5] tpm: fix Atmel TPM crash caused by too frequent queries
  2021-09-05  3:51           ` [PATCH v5] " Hao Wu
@ 2021-09-07 17:43             ` Jarkko Sakkinen
  2021-09-08  8:33               ` Hao Wu
  0 siblings, 1 reply; 91+ messages in thread
From: Jarkko Sakkinen @ 2021-09-07 17:43 UTC (permalink / raw)
  To: Hao Wu, shrihari.kalkar, seungyeop.han, anish.jhaveri,
	peterhuewe, jgg, linux-integrity, pmenzel, kgold, zohar,
	why2jjj.linux, hamza, gregkh, arnd, nayna, James.Bottomley

On Sat, 2021-09-04 at 20:51 -0700, Hao Wu wrote:
> The Atmel TPM 1.2 chips crash with error
> `tpm_try_transmit: send(): error -62` since kernel 4.14.
> It is observed from the kernel log after running `tpm_sealdata -z`.
> The error thrown from the command is as follows
> ```
> $ tpm_sealdata -z
> Tspi_Key_LoadKey failed: 0x00001087 - layer=tddl,
> code=0087 (135), I/O error
> ```
> 
> The issue was reproduced with the following Atmel TPM chip:
> ```
> $ tpm_version
> T0  TPM 1.2 Version Info:
>   Chip Version:        1.2.66.1
>   Spec Level:          2
>   Errata Revision:     3
>   TPM Vendor ID:       ATML
>   TPM Version:         01010000
>   Manufacturer Info:   41544d4c
> ```
> 
> The root cause of the issue is due to the TPM calls to msleep()
> were replaced with usleep_range() [1], which reduces
> the actual timeout. Via experiments, it is observed that
> the original msleep(5) actually sleeps for 15ms.
> Because of a known timeout issue in Atmel TPM 1.2 chip,
> the shorter timeout than 15ms can cause the error described above.
> 
> A few further changes in kernel 4.16 [2] and 4.18 [3, 4] further
> reduced the timeout to less than 1ms. With experiments,
> the problematic timeout in the latest kernel is the one
> for `wait_for_tpm_stat`.
> 
> To fix it, the patch reverts the timeout of `wait_for_tpm_stat`
> to 15ms for all Atmel TPM 1.2 chips, but leave it untouched
> for Ateml TPM 2.0 chip, and chips from other vendors.
> As explained above, the chosen 15ms timeout is
> the actual timeout before this issue introduced,
> thus the old value is used here.
> Particularly, TPM_ATML_TIMEOUT_WAIT_STAT_MIN is set to 14700us,
> TPM_ATML_TIMEOUT_WAIT_STAT_MIN is set to 15000us according to
> the existing TPM_TIMEOUT_RANGE_US (300us).
> The fixed has been tested in the system with the affected Atmel chip
> with no issues observed after boot up.
> 
> References:
> [1] 9f3fc7bcddcb tpm: replace msleep() with usleep_range() in TPM
> 1.2/2.0 generic drivers
> [2] cf151a9a44d5 tpm: reduce tpm polling delay in tpm_tis_core
> [3] 59f5a6b07f64 tpm: reduce poll sleep time in tpm_transmit()
> [4] 424eaf910c32 tpm: reduce polling time to usecs for even finer
> granularity
> 
> Fixes: 9f3fc7bcddcb ("tpm: replace msleep() with usleep_range() in TPM 1.2/2.0 generic drivers")
> Link: https://patchwork.kernel.org/project/linux-integrity/patch/20200926223150.109645-1-hao.wu@rubrik.com/
> Signed-off-by: Hao Wu <hao.wu@rubrik.com>
> ---
> v5:
> - Rename variables according to feedbacks
> - Move timeout min/max to tpm_tis_data
> 
> v4:
> - Move timeout constants to drivers/char/tpm/tpm_tis_core.h
> - Cleanup unnecessary inline comment
> 
> v3:
> - removes unnecessary condition check in `wait_for_tpm_stat`
> 
> v2:
> - follow the existing way to define two timeouts (min and max)
>   for ATMEL chip, thus keep the exact timeout logic for 
>   non-ATEML chips.
> - limit the timeout increase to only ATMEL TPM 1.2 chips,
>   because it is not an issue for TPM 2.0 chips yet.
> 
> Test Plan:
> - Run fixed kernel with ATMEL TPM chips and see crash
> has been fixed.
> - Run fixed kernel with non-ATMEL TPM chips, and confirm
> the timeout has not been changed.
> 
>  drivers/char/tpm/tpm_tis_core.c | 27 +++++++++++++++++++--------
>  drivers/char/tpm/tpm_tis_core.h |  4 ++++
>  include/linux/tpm.h             |  1 +
>  3 files changed, 24 insertions(+), 8 deletions(-)
> 

I just noticed that these are part of the same email thread from
lore.kernel.org. Please always use separate thread. E.g. I'm not sure if
this would play out well with tooling such as b4 that can pick up patch
sets from lore.


> diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
> index 55b9d3965ae1..29de383aec5f 100644
> --- a/drivers/char/tpm/tpm_tis_core.c
> +++ b/drivers/char/tpm/tpm_tis_core.c
> @@ -79,9 +79,10 @@ static int wait_for_tpm_stat(struct tpm_chip *chip, u8 mask,
>  			goto again;
>  		}
>  	} else {
> +		struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev);

Move this declaration to the beginning of the function.

>  		do {
> -			usleep_range(TPM_TIMEOUT_USECS_MIN,
> -				     TPM_TIMEOUT_USECS_MAX);
> +			usleep_range(priv->timeout_min,
> +				     priv->timeout_max);
>  			status = chip->ops->status(chip);
>  			if ((status & mask) == mask)
>  				return 0;
> @@ -934,7 +935,23 @@ int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq,
>  	chip->timeout_b = msecs_to_jiffies(TIS_TIMEOUT_B_MAX);
>  	chip->timeout_c = msecs_to_jiffies(TIS_TIMEOUT_C_MAX);
>  	chip->timeout_d = msecs_to_jiffies(TIS_TIMEOUT_D_MAX);
> +	priv->timeout_min = TPM_TIMEOUT_USECS_MIN;
> +	priv->timeout_max = TPM_TIMEOUT_USECS_MAX;
>  	priv->phy_ops = phy_ops;
> +
> +	rc = tpm_tis_read32(priv, TPM_DID_VID(0), &vendor);
> +	if (rc < 0)
> +		goto out_err;
> +
> +	priv->manufacturer_id = vendor;
> +
> +	if (priv->manufacturer_id == TPM_VID_ATML &&
> +		!(chip->flags & TPM_CHIP_FLAG_TPM2)) {
> +		/* If TPM chip is 1.2 ATMEL chip, timeout need to be relaxed*/
                                                        
A ' ' character missing before the last asterisk.

Also the comment is just in English the same exact thing already
clearly expressed by the if-statement, so it's better that you
just remove the comment altogether.

> +		priv->timeout_min = TIS_TIMEOUT_MIN_ATML;
> +		priv->timeout_max = TIS_TIMEOUT_MAX_ATML;
> +	}
> +
>  	dev_set_drvdata(&chip->dev, priv);
>  
>  	if (is_bsw()) {
> @@ -977,12 +994,6 @@ int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq,
>  	if (rc)
>  		goto out_err;
>  
> -	rc = tpm_tis_read32(priv, TPM_DID_VID(0), &vendor);
> -	if (rc < 0)
> -		goto out_err;
> -
> -	priv->manufacturer_id = vendor;
> -
>  	rc = tpm_tis_read8(priv, TPM_RID(0), &rid);
>  	if (rc < 0)
>  		goto out_err;
> diff --git a/drivers/char/tpm/tpm_tis_core.h b/drivers/char/tpm/tpm_tis_core.h
> index 9b2d32a59f67..c33f27c929f4 100644
> --- a/drivers/char/tpm/tpm_tis_core.h
> +++ b/drivers/char/tpm/tpm_tis_core.h
> @@ -54,6 +54,8 @@ enum tis_defaults {
>  	TIS_MEM_LEN = 0x5000,
>  	TIS_SHORT_TIMEOUT = 750,	/* ms */
>  	TIS_LONG_TIMEOUT = 2000,	/* 2 sec */
> +	TIS_TIMEOUT_MIN_ATML = 14700,	/* usecs */
> +	TIS_TIMEOUT_MAX_ATML = 15000,	/* usecs */
>  };
>  
>  /* Some timeout values are needed before it is known whether the chip is
> @@ -97,6 +99,8 @@ struct tpm_tis_data {
>  	wait_queue_head_t read_queue;
>  	const struct tpm_tis_phy_ops *phy_ops;
>  	unsigned short rng_quality;
> +	unsigned int timeout_min; /* usecs */
> +	unsigned int timeout_max; /* usecs */
>  };
>  
>  struct tpm_tis_phy_ops {
> diff --git a/include/linux/tpm.h b/include/linux/tpm.h
> index aa11fe323c56..12d827734686 100644
> --- a/include/linux/tpm.h
> +++ b/include/linux/tpm.h
> @@ -269,6 +269,7 @@ enum tpm2_cc_attrs {
>  #define TPM_VID_INTEL    0x8086
>  #define TPM_VID_WINBOND  0x1050
>  #define TPM_VID_STM      0x104A
> +#define TPM_VID_ATML     0x1114
>  
>  enum tpm_chip_flags {
>  	TPM_CHIP_FLAG_TPM2		= BIT(1),

Looking good other than a those minor nitpicks. Please send the next as
a separate thread, and *not* as response, so that it can be picked up.

/Jarkko


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v5] tpm: fix Atmel TPM crash caused by too frequent queries
  2021-09-07 17:43             ` Jarkko Sakkinen
@ 2021-09-08  8:33               ` Hao Wu
  0 siblings, 0 replies; 91+ messages in thread
From: Hao Wu @ 2021-09-08  8:33 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: Shrihari Kalkar, Seungyeop Han, Anish Jhaveri, peterhuewe, jgg,
	linux-integrity, Paul Menzel, Ken Goldman, zohar, why2jjj.linux,
	Hamza Attak, gregkh, arnd, Nayna, James.Bottomley


> On Sep 7, 2021, at 10:43 AM, Jarkko Sakkinen <jarkko@kernel.org> wrote:
> 
> On Sat, 2021-09-04 at 20:51 -0700, Hao Wu wrote:
>> The Atmel TPM 1.2 chips crash with error
>> `tpm_try_transmit: send(): error -62` since kernel 4.14.
>> It is observed from the kernel log after running `tpm_sealdata -z`.
>> The error thrown from the command is as follows
>> ```
>> $ tpm_sealdata -z
>> Tspi_Key_LoadKey failed: 0x00001087 - layer=tddl,
>> code=0087 (135), I/O error
>> ```
>> 
>> The issue was reproduced with the following Atmel TPM chip:
>> ```
>> $ tpm_version
>> T0  TPM 1.2 Version Info:
>>  Chip Version:        1.2.66.1
>>  Spec Level:          2
>>  Errata Revision:     3
>>  TPM Vendor ID:       ATML
>>  TPM Version:         01010000
>>  Manufacturer Info:   41544d4c
>> ```
>> 
>> The root cause of the issue is due to the TPM calls to msleep()
>> were replaced with usleep_range() [1], which reduces
>> the actual timeout. Via experiments, it is observed that
>> the original msleep(5) actually sleeps for 15ms.
>> Because of a known timeout issue in Atmel TPM 1.2 chip,
>> the shorter timeout than 15ms can cause the error described above.
>> 
>> A few further changes in kernel 4.16 [2] and 4.18 [3, 4] further
>> reduced the timeout to less than 1ms. With experiments,
>> the problematic timeout in the latest kernel is the one
>> for `wait_for_tpm_stat`.
>> 
>> To fix it, the patch reverts the timeout of `wait_for_tpm_stat`
>> to 15ms for all Atmel TPM 1.2 chips, but leave it untouched
>> for Ateml TPM 2.0 chip, and chips from other vendors.
>> As explained above, the chosen 15ms timeout is
>> the actual timeout before this issue introduced,
>> thus the old value is used here.
>> Particularly, TPM_ATML_TIMEOUT_WAIT_STAT_MIN is set to 14700us,
>> TPM_ATML_TIMEOUT_WAIT_STAT_MIN is set to 15000us according to
>> the existing TPM_TIMEOUT_RANGE_US (300us).
>> The fixed has been tested in the system with the affected Atmel chip
>> with no issues observed after boot up.
>> 
>> References:
>> [1] 9f3fc7bcddcb tpm: replace msleep() with usleep_range() in TPM
>> 1.2/2.0 generic drivers
>> [2] cf151a9a44d5 tpm: reduce tpm polling delay in tpm_tis_core
>> [3] 59f5a6b07f64 tpm: reduce poll sleep time in tpm_transmit()
>> [4] 424eaf910c32 tpm: reduce polling time to usecs for even finer
>> granularity
>> 
>> Fixes: 9f3fc7bcddcb ("tpm: replace msleep() with usleep_range() in TPM 1.2/2.0 generic drivers")
>> Link: https://patchwork.kernel.org/project/linux-integrity/patch/20200926223150.109645-1-hao.wu@rubrik.com/
>> Signed-off-by: Hao Wu <hao.wu@rubrik.com>
>> ---
>> v5:
>> - Rename variables according to feedbacks
>> - Move timeout min/max to tpm_tis_data
>> 
>> v4:
>> - Move timeout constants to drivers/char/tpm/tpm_tis_core.h
>> - Cleanup unnecessary inline comment
>> 
>> v3:
>> - removes unnecessary condition check in `wait_for_tpm_stat`
>> 
>> v2:
>> - follow the existing way to define two timeouts (min and max)
>>  for ATMEL chip, thus keep the exact timeout logic for 
>>  non-ATEML chips.
>> - limit the timeout increase to only ATMEL TPM 1.2 chips,
>>  because it is not an issue for TPM 2.0 chips yet.
>> 
>> Test Plan:
>> - Run fixed kernel with ATMEL TPM chips and see crash
>> has been fixed.
>> - Run fixed kernel with non-ATMEL TPM chips, and confirm
>> the timeout has not been changed.
>> 
>> drivers/char/tpm/tpm_tis_core.c | 27 +++++++++++++++++++--------
>> drivers/char/tpm/tpm_tis_core.h |  4 ++++
>> include/linux/tpm.h             |  1 +
>> 3 files changed, 24 insertions(+), 8 deletions(-)
>> 
> 
> I just noticed that these are part of the same email thread from
> lore.kernel.org. Please always use separate thread. E.g. I'm not sure if
> this would play out well with tooling such as b4 that can pick up patch
> sets from lore.
I see. I thought I need to chain these. Will send a separate one.

> 
>> diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
>> index 55b9d3965ae1..29de383aec5f 100644
>> --- a/drivers/char/tpm/tpm_tis_core.c
>> +++ b/drivers/char/tpm/tpm_tis_core.c
>> @@ -79,9 +79,10 @@ static int wait_for_tpm_stat(struct tpm_chip *chip, u8 mask,
>> 			goto again;
>> 		}
>> 	} else {
>> +		struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev);
> 
> Move this declaration to the beginning of the function.
OK

>> 		do {
>> -			usleep_range(TPM_TIMEOUT_USECS_MIN,
>> -				     TPM_TIMEOUT_USECS_MAX);
>> +			usleep_range(priv->timeout_min,
>> +				     priv->timeout_max);
>> 			status = chip->ops->status(chip);
>> 			if ((status & mask) == mask)
>> 				return 0;
>> @@ -934,7 +935,23 @@ int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq,
>> 	chip->timeout_b = msecs_to_jiffies(TIS_TIMEOUT_B_MAX);
>> 	chip->timeout_c = msecs_to_jiffies(TIS_TIMEOUT_C_MAX);
>> 	chip->timeout_d = msecs_to_jiffies(TIS_TIMEOUT_D_MAX);
>> +	priv->timeout_min = TPM_TIMEOUT_USECS_MIN;
>> +	priv->timeout_max = TPM_TIMEOUT_USECS_MAX;
>> 	priv->phy_ops = phy_ops;
>> +
>> +	rc = tpm_tis_read32(priv, TPM_DID_VID(0), &vendor);
>> +	if (rc < 0)
>> +		goto out_err;
>> +
>> +	priv->manufacturer_id = vendor;
>> +
>> +	if (priv->manufacturer_id == TPM_VID_ATML &&
>> +		!(chip->flags & TPM_CHIP_FLAG_TPM2)) {
>> +		/* If TPM chip is 1.2 ATMEL chip, timeout need to be relaxed*/
> 
> A ' ' character missing before the last asterisk.
> 
> Also the comment is just in English the same exact thing already
> clearly expressed by the if-statement, so it's better that you
> just remove the comment altogether.
Sure will remove it
> 
>> +		priv->timeout_min = TIS_TIMEOUT_MIN_ATML;
>> +		priv->timeout_max = TIS_TIMEOUT_MAX_ATML;
>> +	}
>> +
>> 	dev_set_drvdata(&chip->dev, priv);
>> 
>> 	if (is_bsw()) {
>> @@ -977,12 +994,6 @@ int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq,
>> 	if (rc)
>> 		goto out_err;
>> 
>> -	rc = tpm_tis_read32(priv, TPM_DID_VID(0), &vendor);
>> -	if (rc < 0)
>> -		goto out_err;
>> -
>> -	priv->manufacturer_id = vendor;
>> -
>> 	rc = tpm_tis_read8(priv, TPM_RID(0), &rid);
>> 	if (rc < 0)
>> 		goto out_err;
>> diff --git a/drivers/char/tpm/tpm_tis_core.h b/drivers/char/tpm/tpm_tis_core.h
>> index 9b2d32a59f67..c33f27c929f4 100644
>> --- a/drivers/char/tpm/tpm_tis_core.h
>> +++ b/drivers/char/tpm/tpm_tis_core.h
>> @@ -54,6 +54,8 @@ enum tis_defaults {
>> 	TIS_MEM_LEN = 0x5000,
>> 	TIS_SHORT_TIMEOUT = 750,	/* ms */
>> 	TIS_LONG_TIMEOUT = 2000,	/* 2 sec */
>> +	TIS_TIMEOUT_MIN_ATML = 14700,	/* usecs */
>> +	TIS_TIMEOUT_MAX_ATML = 15000,	/* usecs */
>> };
>> 
>> /* Some timeout values are needed before it is known whether the chip is
>> @@ -97,6 +99,8 @@ struct tpm_tis_data {
>> 	wait_queue_head_t read_queue;
>> 	const struct tpm_tis_phy_ops *phy_ops;
>> 	unsigned short rng_quality;
>> +	unsigned int timeout_min; /* usecs */
>> +	unsigned int timeout_max; /* usecs */
>> };
>> 
>> struct tpm_tis_phy_ops {
>> diff --git a/include/linux/tpm.h b/include/linux/tpm.h
>> index aa11fe323c56..12d827734686 100644
>> --- a/include/linux/tpm.h
>> +++ b/include/linux/tpm.h
>> @@ -269,6 +269,7 @@ enum tpm2_cc_attrs {
>> #define TPM_VID_INTEL    0x8086
>> #define TPM_VID_WINBOND  0x1050
>> #define TPM_VID_STM      0x104A
>> +#define TPM_VID_ATML     0x1114
>> 
>> enum tpm_chip_flags {
>> 	TPM_CHIP_FLAG_TPM2		= BIT(1),
> 
> Looking good other than a those minor nitpicks. Please send the next as
> a separate thread, and *not* as response, so that it can be picked up.
> 
> /Jarkko
> 
Thanks!
Hao


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH] Fix Atmel TPM crash caused by too frequent queries
  2021-05-10  2:17                                                   ` Mimi Zohar
  2021-05-10  3:15                                                     ` Hao Wu
@ 2021-05-10 17:28                                                     ` Jarkko Sakkinen
  1 sibling, 0 replies; 91+ messages in thread
From: Jarkko Sakkinen @ 2021-05-10 17:28 UTC (permalink / raw)
  To: Mimi Zohar
  Cc: Hao Wu, Jarkko Sakkinen, James Bottomley, peterhuewe, jgg, Nayna,
	arnd, gregkh, Hamza Attak, why2jjj.linux, zohar, linux-integrity,
	Paul Menzel, Ken Goldman, Seungyeop Han, Shrihari Kalkar,
	Anish Jhaveri

On Sun, May 09, 2021 at 10:17:01PM -0400, Mimi Zohar wrote:
> [Cc'ing Jarkko Sakkinen <jarkko@kernel.org>]
> 
> On Sat, 2021-05-08 at 23:31 -0700, Hao Wu wrote:
> > > On May 8, 2021, at 11:18 PM, Hao Wu <hao.wu@rubrik.com> wrote:
> > > 
> > >> On Nov 18, 2020, at 1:11 PM, Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> wrote:
> > >> 
> > >> On Fri, Nov 13, 2020 at 08:39:28PM -0800, Hao Wu wrote:
> > >>>> On Oct 17, 2020, at 10:20 PM, Hao Wu <hao.wu@rubrik.com> wrote:
> > >>>> 
> > >>>>> On Oct 17, 2020, at 10:09 PM, Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> wrote:
> > >>>>> 
> > >>>>> On Fri, Oct 16, 2020 at 11:11:37PM -0700, Hao Wu wrote:
> > >>>>>>> On Oct 1, 2020, at 4:04 PM, Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> wrote:
> > >>>>>>> 
> > >>>>>>> On Thu, Oct 01, 2020 at 11:32:59AM -0700, James Bottomley wrote:
> > >>>>>>>> On Thu, 2020-10-01 at 14:15 -0400, Nayna wrote:
> > >>>>>>>>> On 10/1/20 12:53 AM, James Bottomley wrote:
> > >>>>>>>>>> On Thu, 2020-10-01 at 04:50 +0300, Jarkko Sakkinen wrote:
> > >>>>>>>>>>> On Wed, Sep 30, 2020 at 03:31:20PM -0700, James Bottomley wrote:
> > >>>>>>>>>>>> On Thu, 2020-10-01 at 00:09 +0300, Jarkko Sakkinen wrote:
> > >>>>>>>> [...]
> > >>>>>>>>>>>>> I also wonder if we could adjust the frequency dynamically.
> > >>>>>>>>>>>>> I.e. start with optimistic value and lower it until finding
> > >>>>>>>>>>>>> the sweet spot.
> > >>>>>>>>>>>> 
> > >>>>>>>>>>>> The problem is the way this crashes: the TPM seems to be
> > >>>>>>>>>>>> unrecoverable. If it were recoverable without a hard reset of
> > >>>>>>>>>>>> the entire machine, we could certainly play around with it.  I
> > >>>>>>>>>>>> can try alternative mechanisms to see if anything's viable, but
> > >>>>>>>>>>>> to all intents and purposes, it looks like my TPM simply stops
> > >>>>>>>>>>>> responding to the TIS interface.
> > >>>>>>>>>>> 
> > >>>>>>>>>>> A quickly scraped idea probably with some holes in it but I was
> > >>>>>>>>>>> thinking something like
> > >>>>>>>>>>> 
> > >>>>>>>>>>> 1. Initially set slow value for latency, this could be the
> > >>>>>>>>>>> original 15 ms.
> > >>>>>>>>>>> 2. Use this to read TPM_PT_VENDOR_STRING_*.
> > >>>>>>>>>>> 3. Lookup based vendor string from a fixup table a latency that
> > >>>>>>>>>>> works
> > >>>>>>>>>>> (the fallback latency could be the existing latency).
> > >>>>>>>>>> 
> > >>>>>>>>>> Well, yes, that was sort of what I was thinking of doing for the
> > >>>>>>>>>> Atmel ... except I was thinking of using the TIS VID (16 byte
> > >>>>>>>>>> assigned vendor ID) which means we can get the information to set
> > >>>>>>>>>> the timeout before we have to do any TPM operations.
> > >>>>>>>>> 
> > >>>>>>>>> I wonder if the timeout issue exists for all TPM commands for the
> > >>>>>>>>> same manufacturer.  For example, does the ATMEL TPM also crash when 
> > >>>>>>>>> extending  PCRs ?
> > >>>>>>>>> 
> > >>>>>>>>> In addition to defining a per TPM vendor based lookup table for
> > >>>>>>>>> timeout, would it be a good idea to also define a Kconfig/boot param
> > >>>>>>>>> option to allow timeout setting.  This will enable to set the timeout
> > >>>>>>>>> based on the specific use.
> > >>>>>>>> 
> > >>>>>>>> I don't think we need go that far (yet).  The timing change has been in
> > >>>>>>>> upstream since:
> > >>>>>>>> 
> > >>>>>>>> commit 424eaf910c329ab06ad03a527ef45dcf6a328f00
> > >>>>>>>> Author: Nayna Jain <nayna@linux.vnet.ibm.com>
> > >>>>>>>> Date:   Wed May 16 01:51:25 2018 -0400
> > >>>>>>>> 
> > >>>>>>>> tpm: reduce polling time to usecs for even finer granularity
> > >>>>>>>> 
> > >>>>>>>> Which was in the released kernel 4.18: over two years ago.  In all that
> > >>>>>>>> time we've discovered two problems: mine which looks to be an artifact
> > >>>>>>>> of an experimental upgrade process in a new nuvoton and the Atmel. 
> > >>>>>>>> That means pretty much every other TPM simply works with the existing
> > >>>>>>>> timings
> > >>>>>>>> 
> > >>>>>>>>> I was also thinking how will we decide the lookup table values for
> > >>>>>>>>> each vendor ?
> > >>>>>>>> 
> > >>>>>>>> I wasn't thinking we would.  I was thinking I'd do a simple exception
> > >>>>>>>> for the Atmel and nothing else.  I don't think my Nuvoton is in any way
> > >>>>>>>> characteristic.  Indeed my pluggable TPM rainbow bridge system works
> > >>>>>>>> just fine with a Nuvoton and the current timings.
> > >>>>>>>> 
> > >>>>>>>> We can add additional exceptions if they actually turn up.
> > >>>>>>> 
> > >>>>>>> I'd add a table and fallback.
> > >>>>>>> 
> > >>>>>> 
> > >>>>>> Hi folks,
> > >>>>>> 
> > >>>>>> I want to follow up this a bit and check whether we reached a consensus 
> > >>>>>> on how to fix the timeout issue for Atmel chip.
> > >>>>>> 
> > >>>>>> Should we revert the changes or introduce the lookup table for chips.
> > >>>>>> 
> > >>>>>> Is there anything I can help from Rubrik side.
> > >>>>>> 
> > >>>>>> Thanks
> > >>>>>> Hao
> > >>>>> 
> > >>>>> There is nothing to revert as the previous was not applied but I'm
> > >>>>> of course ready to review any new attempts.
> > >>>>> 
> > >>>> 
> > >>>> Hi Jarkko,
> > >>>> 
> > >>>> By “revert” I meant we revert the timeout value changes by applying
> > >>>> the patch I proposed, as the timeout value discussed does cause issues.
> > >>>> 
> > >>>> Why don’t we apply the patch and improve the perf in the way of not
> > >>>> breaking TPMs ? 
> > >>>> 
> > >>>> Hao
> > >>> 
> > >>> Hi Jarkko and folks,
> > >>> 
> > >>> It’s being a while since our last discussion. I want to push a fix in the upstream for ateml chip. 
> > >>> It looks like we currently have following choices:
> > >>> 1.  generic fix for all vendors: have a lookup table for sleep time of wait_for_tpm_stat 
> > >>> (i.e. TPM_TIMEOUT_WAIT_STAT in my proposed patch) 
> > >>> 2.  quick fix for the regression: change the sleep time of wait_for_tpm_stat back to 15ms.
> > >>> It is the current proposed patch
> > >>> 3. Fix regression by making exception for ateml chip.  
> > >>> 
> > >>> Should we reach consensus on which one we want to pursue before dig
> > >>> into implementation of the patch? In my opinion, I prefer to fix the
> > >>> regression with 2, and then pursue 1 as long-term solution. 3 is
> > >>> hacky.
> > >> 
> > >> What does option 1 fix for *all* vendors?
> > >> 
> > >>> Let me know what do you guys think
> > >>> 
> > >>> Hao
> > >> 
> > >> /Jarkko
> > > 
> > > Hi Jarkko and folks,
> > > 
> > > It has been a while again. In my previous message I answered Jarkko’s question about the option 1.
> > > Jarkko, let me know if it is clear to you or you have further questions and suggestions on next to do.
> > > Somehow I couldn’t found the last message I sent but it is in 
> > > https://patchwork.kernel.org/project/linux-integrity/patch/20200926223150.109645-1-hao.wu@rubrik.com/
> > > 
> > > In high-level, the option 1 is to add a timing lookup table for each manufacture, hence we can
> > > configure timing for each chip respectively. Then we don’t need to worry about fixing ATMEL
> > > timing may cause performance degradation for other chips.
> > > 
> > > I do want to push the fix in TPM driver, which is likely to be hit going forward again when people are doing
> > > refactoring without testing chips from all manufacturing.
> > > 
> > > Let me know how should I push this forward.
> > > 
> > > Thanks
> > > Hao
> > > 
> > It looks like Jarkko’s email address (jarkko.sakkinen@linux.intel.com) is unreachable now,
> > can other TPM maintainer / reviewer help make a call and unblock this ? 
> 
> A while ago Jarkko asked everyone to use his kernel.org address.

Hao, I cannot really say that much about patch that does not exist.

The whole consesus thing based on a plan is just semantically wrong
way to look at things. If you have an idea for a patch, make your
own choice and just send the patch.

/Jarkko

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH] Fix Atmel TPM crash caused by too frequent queries
  2021-05-10  2:17                                                   ` Mimi Zohar
@ 2021-05-10  3:15                                                     ` Hao Wu
  2021-05-10 17:28                                                     ` Jarkko Sakkinen
  1 sibling, 0 replies; 91+ messages in thread
From: Hao Wu @ 2021-05-10  3:15 UTC (permalink / raw)
  To: Mimi Zohar, jarkko
  Cc: James Bottomley, peterhuewe, jgg, Nayna, arnd, gregkh,
	Hamza Attak, why2jjj.linux, zohar, linux-integrity, Paul Menzel,
	Ken Goldman, Seungyeop Han, Shrihari Kalkar, Anish Jhaveri,
	Jarkko Sakkinen

> On May 9, 2021, at 7:17 PM, Mimi Zohar <zohar@linux.ibm.com> wrote:
> 
> [Cc'ing Jarkko Sakkinen <jarkko@kernel.org>]
> 
> On Sat, 2021-05-08 at 23:31 -0700, Hao Wu wrote:
>>> On May 8, 2021, at 11:18 PM, Hao Wu <hao.wu@rubrik.com> wrote:
>>> 
>>>> On Nov 18, 2020, at 1:11 PM, Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> wrote:
>>>> 
>>>> On Fri, Nov 13, 2020 at 08:39:28PM -0800, Hao Wu wrote:
>>>>>> On Oct 17, 2020, at 10:20 PM, Hao Wu <hao.wu@rubrik.com> wrote:
>>>>>> 
>>>>>>> On Oct 17, 2020, at 10:09 PM, Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> wrote:
>>>>>>> 
>>>>>>> On Fri, Oct 16, 2020 at 11:11:37PM -0700, Hao Wu wrote:
>>>>>>>>> On Oct 1, 2020, at 4:04 PM, Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> wrote:
>>>>>>>>> 
>>>>>>>>> On Thu, Oct 01, 2020 at 11:32:59AM -0700, James Bottomley wrote:
>>>>>>>>>> On Thu, 2020-10-01 at 14:15 -0400, Nayna wrote:
>>>>>>>>>>> On 10/1/20 12:53 AM, James Bottomley wrote:
>>>>>>>>>>>> On Thu, 2020-10-01 at 04:50 +0300, Jarkko Sakkinen wrote:
>>>>>>>>>>>>> On Wed, Sep 30, 2020 at 03:31:20PM -0700, James Bottomley wrote:
>>>>>>>>>>>>>> On Thu, 2020-10-01 at 00:09 +0300, Jarkko Sakkinen wrote:
>>>>>>>>>> [...]
>>>>>>>>>>>>>>> I also wonder if we could adjust the frequency dynamically.
>>>>>>>>>>>>>>> I.e. start with optimistic value and lower it until finding
>>>>>>>>>>>>>>> the sweet spot.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> The problem is the way this crashes: the TPM seems to be
>>>>>>>>>>>>>> unrecoverable. If it were recoverable without a hard reset of
>>>>>>>>>>>>>> the entire machine, we could certainly play around with it.  I
>>>>>>>>>>>>>> can try alternative mechanisms to see if anything's viable, but
>>>>>>>>>>>>>> to all intents and purposes, it looks like my TPM simply stops
>>>>>>>>>>>>>> responding to the TIS interface.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> A quickly scraped idea probably with some holes in it but I was
>>>>>>>>>>>>> thinking something like
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 1. Initially set slow value for latency, this could be the
>>>>>>>>>>>>> original 15 ms.
>>>>>>>>>>>>> 2. Use this to read TPM_PT_VENDOR_STRING_*.
>>>>>>>>>>>>> 3. Lookup based vendor string from a fixup table a latency that
>>>>>>>>>>>>> works
>>>>>>>>>>>>> (the fallback latency could be the existing latency).
>>>>>>>>>>>> 
>>>>>>>>>>>> Well, yes, that was sort of what I was thinking of doing for the
>>>>>>>>>>>> Atmel ... except I was thinking of using the TIS VID (16 byte
>>>>>>>>>>>> assigned vendor ID) which means we can get the information to set
>>>>>>>>>>>> the timeout before we have to do any TPM operations.
>>>>>>>>>>> 
>>>>>>>>>>> I wonder if the timeout issue exists for all TPM commands for the
>>>>>>>>>>> same manufacturer.  For example, does the ATMEL TPM also crash when 
>>>>>>>>>>> extending  PCRs ?
>>>>>>>>>>> 
>>>>>>>>>>> In addition to defining a per TPM vendor based lookup table for
>>>>>>>>>>> timeout, would it be a good idea to also define a Kconfig/boot param
>>>>>>>>>>> option to allow timeout setting.  This will enable to set the timeout
>>>>>>>>>>> based on the specific use.
>>>>>>>>>> 
>>>>>>>>>> I don't think we need go that far (yet).  The timing change has been in
>>>>>>>>>> upstream since:
>>>>>>>>>> 
>>>>>>>>>> commit 424eaf910c329ab06ad03a527ef45dcf6a328f00
>>>>>>>>>> Author: Nayna Jain <nayna@linux.vnet.ibm.com>
>>>>>>>>>> Date:   Wed May 16 01:51:25 2018 -0400
>>>>>>>>>> 
>>>>>>>>>> tpm: reduce polling time to usecs for even finer granularity
>>>>>>>>>> 
>>>>>>>>>> Which was in the released kernel 4.18: over two years ago.  In all that
>>>>>>>>>> time we've discovered two problems: mine which looks to be an artifact
>>>>>>>>>> of an experimental upgrade process in a new nuvoton and the Atmel. 
>>>>>>>>>> That means pretty much every other TPM simply works with the existing
>>>>>>>>>> timings
>>>>>>>>>> 
>>>>>>>>>>> I was also thinking how will we decide the lookup table values for
>>>>>>>>>>> each vendor ?
>>>>>>>>>> 
>>>>>>>>>> I wasn't thinking we would.  I was thinking I'd do a simple exception
>>>>>>>>>> for the Atmel and nothing else.  I don't think my Nuvoton is in any way
>>>>>>>>>> characteristic.  Indeed my pluggable TPM rainbow bridge system works
>>>>>>>>>> just fine with a Nuvoton and the current timings.
>>>>>>>>>> 
>>>>>>>>>> We can add additional exceptions if they actually turn up.
>>>>>>>>> 
>>>>>>>>> I'd add a table and fallback.
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> Hi folks,
>>>>>>>> 
>>>>>>>> I want to follow up this a bit and check whether we reached a consensus 
>>>>>>>> on how to fix the timeout issue for Atmel chip.
>>>>>>>> 
>>>>>>>> Should we revert the changes or introduce the lookup table for chips.
>>>>>>>> 
>>>>>>>> Is there anything I can help from Rubrik side.
>>>>>>>> 
>>>>>>>> Thanks
>>>>>>>> Hao
>>>>>>> 
>>>>>>> There is nothing to revert as the previous was not applied but I'm
>>>>>>> of course ready to review any new attempts.
>>>>>>> 
>>>>>> 
>>>>>> Hi Jarkko,
>>>>>> 
>>>>>> By “revert” I meant we revert the timeout value changes by applying
>>>>>> the patch I proposed, as the timeout value discussed does cause issues.
>>>>>> 
>>>>>> Why don’t we apply the patch and improve the perf in the way of not
>>>>>> breaking TPMs ? 
>>>>>> 
>>>>>> Hao
>>>>> 
>>>>> Hi Jarkko and folks,
>>>>> 
>>>>> It’s being a while since our last discussion. I want to push a fix in the upstream for ateml chip. 
>>>>> It looks like we currently have following choices:
>>>>> 1.  generic fix for all vendors: have a lookup table for sleep time of wait_for_tpm_stat 
>>>>> (i.e. TPM_TIMEOUT_WAIT_STAT in my proposed patch) 
>>>>> 2.  quick fix for the regression: change the sleep time of wait_for_tpm_stat back to 15ms.
>>>>> It is the current proposed patch
>>>>> 3. Fix regression by making exception for ateml chip.  
>>>>> 
>>>>> Should we reach consensus on which one we want to pursue before dig
>>>>> into implementation of the patch? In my opinion, I prefer to fix the
>>>>> regression with 2, and then pursue 1 as long-term solution. 3 is
>>>>> hacky.
>>>> 
>>>> What does option 1 fix for *all* vendors?
>>>> 
>>>>> Let me know what do you guys think
>>>>> 
>>>>> Hao
>>>> 
>>>> /Jarkko
>>> 
>>> Hi Jarkko and folks,
>>> 
>>> It has been a while again. In my previous message I answered Jarkko’s question about the option 1.
>>> Jarkko, let me know if it is clear to you or you have further questions and suggestions on next to do.
>>> Somehow I couldn’t found the last message I sent but it is in 
>>> https://patchwork.kernel.org/project/linux-integrity/patch/20200926223150.109645-1-hao.wu@rubrik.com/
>>> 
>>> In high-level, the option 1 is to add a timing lookup table for each manufacture, hence we can
>>> configure timing for each chip respectively. Then we don’t need to worry about fixing ATMEL
>>> timing may cause performance degradation for other chips.
>>> 
>>> I do want to push the fix in TPM driver, which is likely to be hit going forward again when people are doing
>>> refactoring without testing chips from all manufacturing.
>>> 
>>> Let me know how should I push this forward.
>>> 
>>> Thanks
>>> Hao
>>> 
>> It looks like Jarkko’s email address (jarkko.sakkinen@linux.intel.com) is unreachable now,
>> can other TPM maintainer / reviewer help make a call and unblock this ? 
> 
> A while ago Jarkko asked everyone to use his kernel.org address.
> 
> Mimi

Ah thanks Mimi, just found Jarkko’s address.

Jarkko please check the message above when you have a chance.

Hao


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH] Fix Atmel TPM crash caused by too frequent queries
  2021-05-09  6:31                                                 ` Hao Wu
@ 2021-05-10  2:17                                                   ` Mimi Zohar
  2021-05-10  3:15                                                     ` Hao Wu
  2021-05-10 17:28                                                     ` Jarkko Sakkinen
  0 siblings, 2 replies; 91+ messages in thread
From: Mimi Zohar @ 2021-05-10  2:17 UTC (permalink / raw)
  To: Hao Wu, Jarkko Sakkinen, James Bottomley, peterhuewe, jgg
  Cc: Nayna, arnd, gregkh, Hamza Attak, why2jjj.linux, zohar,
	linux-integrity, Paul Menzel, Ken Goldman, Seungyeop Han,
	Shrihari Kalkar, Anish Jhaveri, Jarkko Sakkinen

[Cc'ing Jarkko Sakkinen <jarkko@kernel.org>]

On Sat, 2021-05-08 at 23:31 -0700, Hao Wu wrote:
> > On May 8, 2021, at 11:18 PM, Hao Wu <hao.wu@rubrik.com> wrote:
> > 
> >> On Nov 18, 2020, at 1:11 PM, Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> wrote:
> >> 
> >> On Fri, Nov 13, 2020 at 08:39:28PM -0800, Hao Wu wrote:
> >>>> On Oct 17, 2020, at 10:20 PM, Hao Wu <hao.wu@rubrik.com> wrote:
> >>>> 
> >>>>> On Oct 17, 2020, at 10:09 PM, Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> wrote:
> >>>>> 
> >>>>> On Fri, Oct 16, 2020 at 11:11:37PM -0700, Hao Wu wrote:
> >>>>>>> On Oct 1, 2020, at 4:04 PM, Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> wrote:
> >>>>>>> 
> >>>>>>> On Thu, Oct 01, 2020 at 11:32:59AM -0700, James Bottomley wrote:
> >>>>>>>> On Thu, 2020-10-01 at 14:15 -0400, Nayna wrote:
> >>>>>>>>> On 10/1/20 12:53 AM, James Bottomley wrote:
> >>>>>>>>>> On Thu, 2020-10-01 at 04:50 +0300, Jarkko Sakkinen wrote:
> >>>>>>>>>>> On Wed, Sep 30, 2020 at 03:31:20PM -0700, James Bottomley wrote:
> >>>>>>>>>>>> On Thu, 2020-10-01 at 00:09 +0300, Jarkko Sakkinen wrote:
> >>>>>>>> [...]
> >>>>>>>>>>>>> I also wonder if we could adjust the frequency dynamically.
> >>>>>>>>>>>>> I.e. start with optimistic value and lower it until finding
> >>>>>>>>>>>>> the sweet spot.
> >>>>>>>>>>>> 
> >>>>>>>>>>>> The problem is the way this crashes: the TPM seems to be
> >>>>>>>>>>>> unrecoverable. If it were recoverable without a hard reset of
> >>>>>>>>>>>> the entire machine, we could certainly play around with it.  I
> >>>>>>>>>>>> can try alternative mechanisms to see if anything's viable, but
> >>>>>>>>>>>> to all intents and purposes, it looks like my TPM simply stops
> >>>>>>>>>>>> responding to the TIS interface.
> >>>>>>>>>>> 
> >>>>>>>>>>> A quickly scraped idea probably with some holes in it but I was
> >>>>>>>>>>> thinking something like
> >>>>>>>>>>> 
> >>>>>>>>>>> 1. Initially set slow value for latency, this could be the
> >>>>>>>>>>> original 15 ms.
> >>>>>>>>>>> 2. Use this to read TPM_PT_VENDOR_STRING_*.
> >>>>>>>>>>> 3. Lookup based vendor string from a fixup table a latency that
> >>>>>>>>>>> works
> >>>>>>>>>>> (the fallback latency could be the existing latency).
> >>>>>>>>>> 
> >>>>>>>>>> Well, yes, that was sort of what I was thinking of doing for the
> >>>>>>>>>> Atmel ... except I was thinking of using the TIS VID (16 byte
> >>>>>>>>>> assigned vendor ID) which means we can get the information to set
> >>>>>>>>>> the timeout before we have to do any TPM operations.
> >>>>>>>>> 
> >>>>>>>>> I wonder if the timeout issue exists for all TPM commands for the
> >>>>>>>>> same manufacturer.  For example, does the ATMEL TPM also crash when 
> >>>>>>>>> extending  PCRs ?
> >>>>>>>>> 
> >>>>>>>>> In addition to defining a per TPM vendor based lookup table for
> >>>>>>>>> timeout, would it be a good idea to also define a Kconfig/boot param
> >>>>>>>>> option to allow timeout setting.  This will enable to set the timeout
> >>>>>>>>> based on the specific use.
> >>>>>>>> 
> >>>>>>>> I don't think we need go that far (yet).  The timing change has been in
> >>>>>>>> upstream since:
> >>>>>>>> 
> >>>>>>>> commit 424eaf910c329ab06ad03a527ef45dcf6a328f00
> >>>>>>>> Author: Nayna Jain <nayna@linux.vnet.ibm.com>
> >>>>>>>> Date:   Wed May 16 01:51:25 2018 -0400
> >>>>>>>> 
> >>>>>>>> tpm: reduce polling time to usecs for even finer granularity
> >>>>>>>> 
> >>>>>>>> Which was in the released kernel 4.18: over two years ago.  In all that
> >>>>>>>> time we've discovered two problems: mine which looks to be an artifact
> >>>>>>>> of an experimental upgrade process in a new nuvoton and the Atmel. 
> >>>>>>>> That means pretty much every other TPM simply works with the existing
> >>>>>>>> timings
> >>>>>>>> 
> >>>>>>>>> I was also thinking how will we decide the lookup table values for
> >>>>>>>>> each vendor ?
> >>>>>>>> 
> >>>>>>>> I wasn't thinking we would.  I was thinking I'd do a simple exception
> >>>>>>>> for the Atmel and nothing else.  I don't think my Nuvoton is in any way
> >>>>>>>> characteristic.  Indeed my pluggable TPM rainbow bridge system works
> >>>>>>>> just fine with a Nuvoton and the current timings.
> >>>>>>>> 
> >>>>>>>> We can add additional exceptions if they actually turn up.
> >>>>>>> 
> >>>>>>> I'd add a table and fallback.
> >>>>>>> 
> >>>>>> 
> >>>>>> Hi folks,
> >>>>>> 
> >>>>>> I want to follow up this a bit and check whether we reached a consensus 
> >>>>>> on how to fix the timeout issue for Atmel chip.
> >>>>>> 
> >>>>>> Should we revert the changes or introduce the lookup table for chips.
> >>>>>> 
> >>>>>> Is there anything I can help from Rubrik side.
> >>>>>> 
> >>>>>> Thanks
> >>>>>> Hao
> >>>>> 
> >>>>> There is nothing to revert as the previous was not applied but I'm
> >>>>> of course ready to review any new attempts.
> >>>>> 
> >>>> 
> >>>> Hi Jarkko,
> >>>> 
> >>>> By “revert” I meant we revert the timeout value changes by applying
> >>>> the patch I proposed, as the timeout value discussed does cause issues.
> >>>> 
> >>>> Why don’t we apply the patch and improve the perf in the way of not
> >>>> breaking TPMs ? 
> >>>> 
> >>>> Hao
> >>> 
> >>> Hi Jarkko and folks,
> >>> 
> >>> It’s being a while since our last discussion. I want to push a fix in the upstream for ateml chip. 
> >>> It looks like we currently have following choices:
> >>> 1.  generic fix for all vendors: have a lookup table for sleep time of wait_for_tpm_stat 
> >>> (i.e. TPM_TIMEOUT_WAIT_STAT in my proposed patch) 
> >>> 2.  quick fix for the regression: change the sleep time of wait_for_tpm_stat back to 15ms.
> >>> It is the current proposed patch
> >>> 3. Fix regression by making exception for ateml chip.  
> >>> 
> >>> Should we reach consensus on which one we want to pursue before dig
> >>> into implementation of the patch? In my opinion, I prefer to fix the
> >>> regression with 2, and then pursue 1 as long-term solution. 3 is
> >>> hacky.
> >> 
> >> What does option 1 fix for *all* vendors?
> >> 
> >>> Let me know what do you guys think
> >>> 
> >>> Hao
> >> 
> >> /Jarkko
> > 
> > Hi Jarkko and folks,
> > 
> > It has been a while again. In my previous message I answered Jarkko’s question about the option 1.
> > Jarkko, let me know if it is clear to you or you have further questions and suggestions on next to do.
> > Somehow I couldn’t found the last message I sent but it is in 
> > https://patchwork.kernel.org/project/linux-integrity/patch/20200926223150.109645-1-hao.wu@rubrik.com/
> > 
> > In high-level, the option 1 is to add a timing lookup table for each manufacture, hence we can
> > configure timing for each chip respectively. Then we don’t need to worry about fixing ATMEL
> > timing may cause performance degradation for other chips.
> > 
> > I do want to push the fix in TPM driver, which is likely to be hit going forward again when people are doing
> > refactoring without testing chips from all manufacturing.
> > 
> > Let me know how should I push this forward.
> > 
> > Thanks
> > Hao
> > 
> It looks like Jarkko’s email address (jarkko.sakkinen@linux.intel.com) is unreachable now,
> can other TPM maintainer / reviewer help make a call and unblock this ? 

A while ago Jarkko asked everyone to use his kernel.org address.

Mimi


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH] Fix Atmel TPM crash caused by too frequent queries
  2021-05-09  6:18                                               ` Hao Wu
@ 2021-05-09  6:31                                                 ` Hao Wu
  2021-05-10  2:17                                                   ` Mimi Zohar
  0 siblings, 1 reply; 91+ messages in thread
From: Hao Wu @ 2021-05-09  6:31 UTC (permalink / raw)
  To: Jarkko Sakkinen, James Bottomley, peterhuewe, jgg
  Cc: Nayna, arnd, gregkh, Hamza Attak, why2jjj.linux, zohar,
	linux-integrity, Paul Menzel, Ken Goldman, Seungyeop Han,
	Shrihari Kalkar, Anish Jhaveri

> On May 8, 2021, at 11:18 PM, Hao Wu <hao.wu@rubrik.com> wrote:
> 
>> On Nov 18, 2020, at 1:11 PM, Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> wrote:
>> 
>> On Fri, Nov 13, 2020 at 08:39:28PM -0800, Hao Wu wrote:
>>>> On Oct 17, 2020, at 10:20 PM, Hao Wu <hao.wu@rubrik.com> wrote:
>>>> 
>>>>> On Oct 17, 2020, at 10:09 PM, Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> wrote:
>>>>> 
>>>>> On Fri, Oct 16, 2020 at 11:11:37PM -0700, Hao Wu wrote:
>>>>>>> On Oct 1, 2020, at 4:04 PM, Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> wrote:
>>>>>>> 
>>>>>>> On Thu, Oct 01, 2020 at 11:32:59AM -0700, James Bottomley wrote:
>>>>>>>> On Thu, 2020-10-01 at 14:15 -0400, Nayna wrote:
>>>>>>>>> On 10/1/20 12:53 AM, James Bottomley wrote:
>>>>>>>>>> On Thu, 2020-10-01 at 04:50 +0300, Jarkko Sakkinen wrote:
>>>>>>>>>>> On Wed, Sep 30, 2020 at 03:31:20PM -0700, James Bottomley wrote:
>>>>>>>>>>>> On Thu, 2020-10-01 at 00:09 +0300, Jarkko Sakkinen wrote:
>>>>>>>> [...]
>>>>>>>>>>>>> I also wonder if we could adjust the frequency dynamically.
>>>>>>>>>>>>> I.e. start with optimistic value and lower it until finding
>>>>>>>>>>>>> the sweet spot.
>>>>>>>>>>>> 
>>>>>>>>>>>> The problem is the way this crashes: the TPM seems to be
>>>>>>>>>>>> unrecoverable. If it were recoverable without a hard reset of
>>>>>>>>>>>> the entire machine, we could certainly play around with it.  I
>>>>>>>>>>>> can try alternative mechanisms to see if anything's viable, but
>>>>>>>>>>>> to all intents and purposes, it looks like my TPM simply stops
>>>>>>>>>>>> responding to the TIS interface.
>>>>>>>>>>> 
>>>>>>>>>>> A quickly scraped idea probably with some holes in it but I was
>>>>>>>>>>> thinking something like
>>>>>>>>>>> 
>>>>>>>>>>> 1. Initially set slow value for latency, this could be the
>>>>>>>>>>> original 15 ms.
>>>>>>>>>>> 2. Use this to read TPM_PT_VENDOR_STRING_*.
>>>>>>>>>>> 3. Lookup based vendor string from a fixup table a latency that
>>>>>>>>>>> works
>>>>>>>>>>> (the fallback latency could be the existing latency).
>>>>>>>>>> 
>>>>>>>>>> Well, yes, that was sort of what I was thinking of doing for the
>>>>>>>>>> Atmel ... except I was thinking of using the TIS VID (16 byte
>>>>>>>>>> assigned vendor ID) which means we can get the information to set
>>>>>>>>>> the timeout before we have to do any TPM operations.
>>>>>>>>> 
>>>>>>>>> I wonder if the timeout issue exists for all TPM commands for the
>>>>>>>>> same manufacturer.  For example, does the ATMEL TPM also crash when 
>>>>>>>>> extending  PCRs ?
>>>>>>>>> 
>>>>>>>>> In addition to defining a per TPM vendor based lookup table for
>>>>>>>>> timeout, would it be a good idea to also define a Kconfig/boot param
>>>>>>>>> option to allow timeout setting.  This will enable to set the timeout
>>>>>>>>> based on the specific use.
>>>>>>>> 
>>>>>>>> I don't think we need go that far (yet).  The timing change has been in
>>>>>>>> upstream since:
>>>>>>>> 
>>>>>>>> commit 424eaf910c329ab06ad03a527ef45dcf6a328f00
>>>>>>>> Author: Nayna Jain <nayna@linux.vnet.ibm.com>
>>>>>>>> Date:   Wed May 16 01:51:25 2018 -0400
>>>>>>>> 
>>>>>>>> tpm: reduce polling time to usecs for even finer granularity
>>>>>>>> 
>>>>>>>> Which was in the released kernel 4.18: over two years ago.  In all that
>>>>>>>> time we've discovered two problems: mine which looks to be an artifact
>>>>>>>> of an experimental upgrade process in a new nuvoton and the Atmel. 
>>>>>>>> That means pretty much every other TPM simply works with the existing
>>>>>>>> timings
>>>>>>>> 
>>>>>>>>> I was also thinking how will we decide the lookup table values for
>>>>>>>>> each vendor ?
>>>>>>>> 
>>>>>>>> I wasn't thinking we would.  I was thinking I'd do a simple exception
>>>>>>>> for the Atmel and nothing else.  I don't think my Nuvoton is in any way
>>>>>>>> characteristic.  Indeed my pluggable TPM rainbow bridge system works
>>>>>>>> just fine with a Nuvoton and the current timings.
>>>>>>>> 
>>>>>>>> We can add additional exceptions if they actually turn up.
>>>>>>> 
>>>>>>> I'd add a table and fallback.
>>>>>>> 
>>>>>> 
>>>>>> Hi folks,
>>>>>> 
>>>>>> I want to follow up this a bit and check whether we reached a consensus 
>>>>>> on how to fix the timeout issue for Atmel chip.
>>>>>> 
>>>>>> Should we revert the changes or introduce the lookup table for chips.
>>>>>> 
>>>>>> Is there anything I can help from Rubrik side.
>>>>>> 
>>>>>> Thanks
>>>>>> Hao
>>>>> 
>>>>> There is nothing to revert as the previous was not applied but I'm
>>>>> of course ready to review any new attempts.
>>>>> 
>>>> 
>>>> Hi Jarkko,
>>>> 
>>>> By “revert” I meant we revert the timeout value changes by applying
>>>> the patch I proposed, as the timeout value discussed does cause issues.
>>>> 
>>>> Why don’t we apply the patch and improve the perf in the way of not
>>>> breaking TPMs ? 
>>>> 
>>>> Hao
>>> 
>>> Hi Jarkko and folks,
>>> 
>>> It’s being a while since our last discussion. I want to push a fix in the upstream for ateml chip. 
>>> It looks like we currently have following choices:
>>> 1.  generic fix for all vendors: have a lookup table for sleep time of wait_for_tpm_stat 
>>> (i.e. TPM_TIMEOUT_WAIT_STAT in my proposed patch) 
>>> 2.  quick fix for the regression: change the sleep time of wait_for_tpm_stat back to 15ms.
>>> It is the current proposed patch
>>> 3. Fix regression by making exception for ateml chip.  
>>> 
>>> Should we reach consensus on which one we want to pursue before dig
>>> into implementation of the patch? In my opinion, I prefer to fix the
>>> regression with 2, and then pursue 1 as long-term solution. 3 is
>>> hacky.
>> 
>> What does option 1 fix for *all* vendors?
>> 
>>> Let me know what do you guys think
>>> 
>>> Hao
>> 
>> /Jarkko
> 
> Hi Jarkko and folks,
> 
> It has been a while again. In my previous message I answered Jarkko’s question about the option 1.
> Jarkko, let me know if it is clear to you or you have further questions and suggestions on next to do.
> Somehow I couldn’t found the last message I sent but it is in 
> https://patchwork.kernel.org/project/linux-integrity/patch/20200926223150.109645-1-hao.wu@rubrik.com/
> 
> In high-level, the option 1 is to add a timing lookup table for each manufacture, hence we can
> configure timing for each chip respectively. Then we don’t need to worry about fixing ATMEL
> timing may cause performance degradation for other chips.
> 
> I do want to push the fix in TPM driver, which is likely to be hit going forward again when people are doing
> refactoring without testing chips from all manufacturing.
> 
> Let me know how should I push this forward.
> 
> Thanks
> Hao
> 
It looks like Jarkko’s email address (jarkko.sakkinen@linux.intel.com) is unreachable now,
can other TPM maintainer / reviewer help make a call and unblock this ? 

Thanks
Hao

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH] Fix Atmel TPM crash caused by too frequent queries
  2020-11-18 21:11                                             ` Jarkko Sakkinen
  2020-11-18 23:23                                               ` Hao Wu
@ 2021-05-09  6:18                                               ` Hao Wu
  2021-05-09  6:31                                                 ` Hao Wu
  1 sibling, 1 reply; 91+ messages in thread
From: Hao Wu @ 2021-05-09  6:18 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: James Bottomley, Nayna, peterhuewe, jgg, arnd, gregkh,
	Hamza Attak, why2jjj.linux, zohar, linux-integrity, Paul Menzel,
	Ken Goldman, Seungyeop Han, Shrihari Kalkar, Anish Jhaveri

> On Nov 18, 2020, at 1:11 PM, Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> wrote:
> 
> On Fri, Nov 13, 2020 at 08:39:28PM -0800, Hao Wu wrote:
>>> On Oct 17, 2020, at 10:20 PM, Hao Wu <hao.wu@rubrik.com> wrote:
>>> 
>>>> On Oct 17, 2020, at 10:09 PM, Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> wrote:
>>>> 
>>>> On Fri, Oct 16, 2020 at 11:11:37PM -0700, Hao Wu wrote:
>>>>>> On Oct 1, 2020, at 4:04 PM, Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> wrote:
>>>>>> 
>>>>>> On Thu, Oct 01, 2020 at 11:32:59AM -0700, James Bottomley wrote:
>>>>>>> On Thu, 2020-10-01 at 14:15 -0400, Nayna wrote:
>>>>>>>> On 10/1/20 12:53 AM, James Bottomley wrote:
>>>>>>>>> On Thu, 2020-10-01 at 04:50 +0300, Jarkko Sakkinen wrote:
>>>>>>>>>> On Wed, Sep 30, 2020 at 03:31:20PM -0700, James Bottomley wrote:
>>>>>>>>>>> On Thu, 2020-10-01 at 00:09 +0300, Jarkko Sakkinen wrote:
>>>>>>> [...]
>>>>>>>>>>>> I also wonder if we could adjust the frequency dynamically.
>>>>>>>>>>>> I.e. start with optimistic value and lower it until finding
>>>>>>>>>>>> the sweet spot.
>>>>>>>>>>> 
>>>>>>>>>>> The problem is the way this crashes: the TPM seems to be
>>>>>>>>>>> unrecoverable. If it were recoverable without a hard reset of
>>>>>>>>>>> the entire machine, we could certainly play around with it.  I
>>>>>>>>>>> can try alternative mechanisms to see if anything's viable, but
>>>>>>>>>>> to all intents and purposes, it looks like my TPM simply stops
>>>>>>>>>>> responding to the TIS interface.
>>>>>>>>>> 
>>>>>>>>>> A quickly scraped idea probably with some holes in it but I was
>>>>>>>>>> thinking something like
>>>>>>>>>> 
>>>>>>>>>> 1. Initially set slow value for latency, this could be the
>>>>>>>>>> original 15 ms.
>>>>>>>>>> 2. Use this to read TPM_PT_VENDOR_STRING_*.
>>>>>>>>>> 3. Lookup based vendor string from a fixup table a latency that
>>>>>>>>>> works
>>>>>>>>>> (the fallback latency could be the existing latency).
>>>>>>>>> 
>>>>>>>>> Well, yes, that was sort of what I was thinking of doing for the
>>>>>>>>> Atmel ... except I was thinking of using the TIS VID (16 byte
>>>>>>>>> assigned vendor ID) which means we can get the information to set
>>>>>>>>> the timeout before we have to do any TPM operations.
>>>>>>>> 
>>>>>>>> I wonder if the timeout issue exists for all TPM commands for the
>>>>>>>> same manufacturer.  For example, does the ATMEL TPM also crash when 
>>>>>>>> extending  PCRs ?
>>>>>>>> 
>>>>>>>> In addition to defining a per TPM vendor based lookup table for
>>>>>>>> timeout, would it be a good idea to also define a Kconfig/boot param
>>>>>>>> option to allow timeout setting.  This will enable to set the timeout
>>>>>>>> based on the specific use.
>>>>>>> 
>>>>>>> I don't think we need go that far (yet).  The timing change has been in
>>>>>>> upstream since:
>>>>>>> 
>>>>>>> commit 424eaf910c329ab06ad03a527ef45dcf6a328f00
>>>>>>> Author: Nayna Jain <nayna@linux.vnet.ibm.com>
>>>>>>> Date:   Wed May 16 01:51:25 2018 -0400
>>>>>>> 
>>>>>>> tpm: reduce polling time to usecs for even finer granularity
>>>>>>> 
>>>>>>> Which was in the released kernel 4.18: over two years ago.  In all that
>>>>>>> time we've discovered two problems: mine which looks to be an artifact
>>>>>>> of an experimental upgrade process in a new nuvoton and the Atmel. 
>>>>>>> That means pretty much every other TPM simply works with the existing
>>>>>>> timings
>>>>>>> 
>>>>>>>> I was also thinking how will we decide the lookup table values for
>>>>>>>> each vendor ?
>>>>>>> 
>>>>>>> I wasn't thinking we would.  I was thinking I'd do a simple exception
>>>>>>> for the Atmel and nothing else.  I don't think my Nuvoton is in any way
>>>>>>> characteristic.  Indeed my pluggable TPM rainbow bridge system works
>>>>>>> just fine with a Nuvoton and the current timings.
>>>>>>> 
>>>>>>> We can add additional exceptions if they actually turn up.
>>>>>> 
>>>>>> I'd add a table and fallback.
>>>>>> 
>>>>> 
>>>>> Hi folks,
>>>>> 
>>>>> I want to follow up this a bit and check whether we reached a consensus 
>>>>> on how to fix the timeout issue for Atmel chip.
>>>>> 
>>>>> Should we revert the changes or introduce the lookup table for chips.
>>>>> 
>>>>> Is there anything I can help from Rubrik side.
>>>>> 
>>>>> Thanks
>>>>> Hao
>>>> 
>>>> There is nothing to revert as the previous was not applied but I'm
>>>> of course ready to review any new attempts.
>>>> 
>>> 
>>> Hi Jarkko,
>>> 
>>> By “revert” I meant we revert the timeout value changes by applying
>>> the patch I proposed, as the timeout value discussed does cause issues.
>>> 
>>> Why don’t we apply the patch and improve the perf in the way of not
>>> breaking TPMs ? 
>>> 
>>> Hao
>> 
>> Hi Jarkko and folks,
>> 
>> It’s being a while since our last discussion. I want to push a fix in the upstream for ateml chip. 
>> It looks like we currently have following choices:
>> 1.  generic fix for all vendors: have a lookup table for sleep time of wait_for_tpm_stat 
>>  (i.e. TPM_TIMEOUT_WAIT_STAT in my proposed patch) 
>> 2.  quick fix for the regression: change the sleep time of wait_for_tpm_stat back to 15ms.
>>  It is the current proposed patch
>> 3. Fix regression by making exception for ateml chip.  
>> 
>> Should we reach consensus on which one we want to pursue before dig
>> into implementation of the patch? In my opinion, I prefer to fix the
>> regression with 2, and then pursue 1 as long-term solution. 3 is
>> hacky.
> 
> What does option 1 fix for *all* vendors?
> 
>> Let me know what do you guys think
>> 
>> Hao
> 
> /Jarkko

Hi Jarkko and folks,

It has been a while again. In my previous message I answered Jarkko’s question about the option 1.
Jarkko, let me know if it is clear to you or you have further questions and suggestions on next to do.
Somehow I couldn’t found the last message I sent but it is in 
https://patchwork.kernel.org/project/linux-integrity/patch/20200926223150.109645-1-hao.wu@rubrik.com/

In high-level, the option 1 is to add a timing lookup table for each manufacture, hence we can
configure timing for each chip respectively. Then we don’t need to worry about fixing ATMEL
timing may cause performance degradation for other chips.

I do want to push the fix in TPM driver, which is likely to be hit going forward again when people are doing
refactoring without testing chips from all manufacturing.

Let me know how should I push this forward.

Thanks
Hao


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH] Fix Atmel TPM crash caused by too frequent queries
  2020-11-18 21:11                                             ` Jarkko Sakkinen
@ 2020-11-18 23:23                                               ` Hao Wu
  2021-05-09  6:18                                               ` Hao Wu
  1 sibling, 0 replies; 91+ messages in thread
From: Hao Wu @ 2020-11-18 23:23 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: James Bottomley, Nayna, peterhuewe, jgg, arnd, gregkh,
	Hamza Attak, why2jjj.linux, zohar, linux-integrity, Paul Menzel,
	Ken Goldman, Seungyeop Han, Shrihari Kalkar, Anish Jhaveri

> On Nov 18, 2020, at 1:11 PM, Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> wrote:
> 
> On Fri, Nov 13, 2020 at 08:39:28PM -0800, Hao Wu wrote:
>>> On Oct 17, 2020, at 10:20 PM, Hao Wu <hao.wu@rubrik.com> wrote:
>>> 
>>>> On Oct 17, 2020, at 10:09 PM, Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> wrote:
>>>> 
>>>> On Fri, Oct 16, 2020 at 11:11:37PM -0700, Hao Wu wrote:
>>>>>> On Oct 1, 2020, at 4:04 PM, Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> wrote:
>>>>>> 
>>>>>> On Thu, Oct 01, 2020 at 11:32:59AM -0700, James Bottomley wrote:
>>>>>>> On Thu, 2020-10-01 at 14:15 -0400, Nayna wrote:
>>>>>>>> On 10/1/20 12:53 AM, James Bottomley wrote:
>>>>>>>>> On Thu, 2020-10-01 at 04:50 +0300, Jarkko Sakkinen wrote:
>>>>>>>>>> On Wed, Sep 30, 2020 at 03:31:20PM -0700, James Bottomley wrote:
>>>>>>>>>>> On Thu, 2020-10-01 at 00:09 +0300, Jarkko Sakkinen wrote:
>>>>>>> [...]
>>>>>>>>>>>> I also wonder if we could adjust the frequency dynamically.
>>>>>>>>>>>> I.e. start with optimistic value and lower it until finding
>>>>>>>>>>>> the sweet spot.
>>>>>>>>>>> 
>>>>>>>>>>> The problem is the way this crashes: the TPM seems to be
>>>>>>>>>>> unrecoverable. If it were recoverable without a hard reset of
>>>>>>>>>>> the entire machine, we could certainly play around with it.  I
>>>>>>>>>>> can try alternative mechanisms to see if anything's viable, but
>>>>>>>>>>> to all intents and purposes, it looks like my TPM simply stops
>>>>>>>>>>> responding to the TIS interface.
>>>>>>>>>> 
>>>>>>>>>> A quickly scraped idea probably with some holes in it but I was
>>>>>>>>>> thinking something like
>>>>>>>>>> 
>>>>>>>>>> 1. Initially set slow value for latency, this could be the
>>>>>>>>>> original 15 ms.
>>>>>>>>>> 2. Use this to read TPM_PT_VENDOR_STRING_*.
>>>>>>>>>> 3. Lookup based vendor string from a fixup table a latency that
>>>>>>>>>> works
>>>>>>>>>> (the fallback latency could be the existing latency).
>>>>>>>>> 
>>>>>>>>> Well, yes, that was sort of what I was thinking of doing for the
>>>>>>>>> Atmel ... except I was thinking of using the TIS VID (16 byte
>>>>>>>>> assigned vendor ID) which means we can get the information to set
>>>>>>>>> the timeout before we have to do any TPM operations.
>>>>>>>> 
>>>>>>>> I wonder if the timeout issue exists for all TPM commands for the
>>>>>>>> same manufacturer.  For example, does the ATMEL TPM also crash when 
>>>>>>>> extending  PCRs ?
>>>>>>>> 
>>>>>>>> In addition to defining a per TPM vendor based lookup table for
>>>>>>>> timeout, would it be a good idea to also define a Kconfig/boot param
>>>>>>>> option to allow timeout setting.  This will enable to set the timeout
>>>>>>>> based on the specific use.
>>>>>>> 
>>>>>>> I don't think we need go that far (yet).  The timing change has been in
>>>>>>> upstream since:
>>>>>>> 
>>>>>>> commit 424eaf910c329ab06ad03a527ef45dcf6a328f00
>>>>>>> Author: Nayna Jain <nayna@linux.vnet.ibm.com>
>>>>>>> Date:   Wed May 16 01:51:25 2018 -0400
>>>>>>> 
>>>>>>> tpm: reduce polling time to usecs for even finer granularity
>>>>>>> 
>>>>>>> Which was in the released kernel 4.18: over two years ago.  In all that
>>>>>>> time we've discovered two problems: mine which looks to be an artifact
>>>>>>> of an experimental upgrade process in a new nuvoton and the Atmel. 
>>>>>>> That means pretty much every other TPM simply works with the existing
>>>>>>> timings
>>>>>>> 
>>>>>>>> I was also thinking how will we decide the lookup table values for
>>>>>>>> each vendor ?
>>>>>>> 
>>>>>>> I wasn't thinking we would.  I was thinking I'd do a simple exception
>>>>>>> for the Atmel and nothing else.  I don't think my Nuvoton is in any way
>>>>>>> characteristic.  Indeed my pluggable TPM rainbow bridge system works
>>>>>>> just fine with a Nuvoton and the current timings.
>>>>>>> 
>>>>>>> We can add additional exceptions if they actually turn up.
>>>>>> 
>>>>>> I'd add a table and fallback.
>>>>>> 
>>>>> 
>>>>> Hi folks,
>>>>> 
>>>>> I want to follow up this a bit and check whether we reached a consensus 
>>>>> on how to fix the timeout issue for Atmel chip.
>>>>> 
>>>>> Should we revert the changes or introduce the lookup table for chips.
>>>>> 
>>>>> Is there anything I can help from Rubrik side.
>>>>> 
>>>>> Thanks
>>>>> Hao
>>>> 
>>>> There is nothing to revert as the previous was not applied but I'm
>>>> of course ready to review any new attempts.
>>>> 
>>> 
>>> Hi Jarkko,
>>> 
>>> By “revert” I meant we revert the timeout value changes by applying
>>> the patch I proposed, as the timeout value discussed does cause issues.
>>> 
>>> Why don’t we apply the patch and improve the perf in the way of not
>>> breaking TPMs ? 
>>> 
>>> Hao
>> 
>> Hi Jarkko and folks,
>> 
>> It’s being a while since our last discussion. I want to push a fix in the upstream for ateml chip. 
>> It looks like we currently have following choices:
>> 1.  generic fix for all vendors: have a lookup table for sleep time of wait_for_tpm_stat 
>>  (i.e. TPM_TIMEOUT_WAIT_STAT in my proposed patch) 
>> 2.  quick fix for the regression: change the sleep time of wait_for_tpm_stat back to 15ms.
>>  It is the current proposed patch
>> 3. Fix regression by making exception for ateml chip.  
>> 
>> Should we reach consensus on which one we want to pursue before dig
>> into implementation of the patch? In my opinion, I prefer to fix the
>> regression with 2, and then pursue 1 as long-term solution. 3 is
>> hacky.
> 
> What does option 1 fix for *all* vendors?
> 
>> Let me know what do you guys think
>> 
>> Hao
> 
> /Jarkko

I meant timing is potentially sensitive per vendor,
the option 1 is a solution for all vendor if this issue happening again.
The option 1 is just what you mentioned in the earlier discussion:
>>>>>> I'd add a table and fallback.


Hao


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH] Fix Atmel TPM crash caused by too frequent queries
  2020-11-14  4:39                                           ` Hao Wu
@ 2020-11-18 21:11                                             ` Jarkko Sakkinen
  2020-11-18 23:23                                               ` Hao Wu
  2021-05-09  6:18                                               ` Hao Wu
  0 siblings, 2 replies; 91+ messages in thread
From: Jarkko Sakkinen @ 2020-11-18 21:11 UTC (permalink / raw)
  To: Hao Wu
  Cc: James Bottomley, Nayna, peterhuewe, jgg, arnd, gregkh,
	Hamza Attak, why2jjj.linux, zohar, linux-integrity, Paul Menzel,
	Ken Goldman, Seungyeop Han, Shrihari Kalkar, Anish Jhaveri

On Fri, Nov 13, 2020 at 08:39:28PM -0800, Hao Wu wrote:
> > On Oct 17, 2020, at 10:20 PM, Hao Wu <hao.wu@rubrik.com> wrote:
> > 
> >> On Oct 17, 2020, at 10:09 PM, Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> wrote:
> >> 
> >> On Fri, Oct 16, 2020 at 11:11:37PM -0700, Hao Wu wrote:
> >>>> On Oct 1, 2020, at 4:04 PM, Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> wrote:
> >>>> 
> >>>> On Thu, Oct 01, 2020 at 11:32:59AM -0700, James Bottomley wrote:
> >>>>> On Thu, 2020-10-01 at 14:15 -0400, Nayna wrote:
> >>>>>> On 10/1/20 12:53 AM, James Bottomley wrote:
> >>>>>>> On Thu, 2020-10-01 at 04:50 +0300, Jarkko Sakkinen wrote:
> >>>>>>>> On Wed, Sep 30, 2020 at 03:31:20PM -0700, James Bottomley wrote:
> >>>>>>>>> On Thu, 2020-10-01 at 00:09 +0300, Jarkko Sakkinen wrote:
> >>>>> [...]
> >>>>>>>>>> I also wonder if we could adjust the frequency dynamically.
> >>>>>>>>>> I.e. start with optimistic value and lower it until finding
> >>>>>>>>>> the sweet spot.
> >>>>>>>>> 
> >>>>>>>>> The problem is the way this crashes: the TPM seems to be
> >>>>>>>>> unrecoverable. If it were recoverable without a hard reset of
> >>>>>>>>> the entire machine, we could certainly play around with it.  I
> >>>>>>>>> can try alternative mechanisms to see if anything's viable, but
> >>>>>>>>> to all intents and purposes, it looks like my TPM simply stops
> >>>>>>>>> responding to the TIS interface.
> >>>>>>>> 
> >>>>>>>> A quickly scraped idea probably with some holes in it but I was
> >>>>>>>> thinking something like
> >>>>>>>> 
> >>>>>>>> 1. Initially set slow value for latency, this could be the
> >>>>>>>> original 15 ms.
> >>>>>>>> 2. Use this to read TPM_PT_VENDOR_STRING_*.
> >>>>>>>> 3. Lookup based vendor string from a fixup table a latency that
> >>>>>>>> works
> >>>>>>>>  (the fallback latency could be the existing latency).
> >>>>>>> 
> >>>>>>> Well, yes, that was sort of what I was thinking of doing for the
> >>>>>>> Atmel ... except I was thinking of using the TIS VID (16 byte
> >>>>>>> assigned vendor ID) which means we can get the information to set
> >>>>>>> the timeout before we have to do any TPM operations.
> >>>>>> 
> >>>>>> I wonder if the timeout issue exists for all TPM commands for the
> >>>>>> same manufacturer.  For example, does the ATMEL TPM also crash when 
> >>>>>> extending  PCRs ?
> >>>>>> 
> >>>>>> In addition to defining a per TPM vendor based lookup table for
> >>>>>> timeout, would it be a good idea to also define a Kconfig/boot param
> >>>>>> option to allow timeout setting.  This will enable to set the timeout
> >>>>>> based on the specific use.
> >>>>> 
> >>>>> I don't think we need go that far (yet).  The timing change has been in
> >>>>> upstream since:
> >>>>> 
> >>>>> commit 424eaf910c329ab06ad03a527ef45dcf6a328f00
> >>>>> Author: Nayna Jain <nayna@linux.vnet.ibm.com>
> >>>>> Date:   Wed May 16 01:51:25 2018 -0400
> >>>>> 
> >>>>>  tpm: reduce polling time to usecs for even finer granularity
> >>>>> 
> >>>>> Which was in the released kernel 4.18: over two years ago.  In all that
> >>>>> time we've discovered two problems: mine which looks to be an artifact
> >>>>> of an experimental upgrade process in a new nuvoton and the Atmel. 
> >>>>> That means pretty much every other TPM simply works with the existing
> >>>>> timings
> >>>>> 
> >>>>>> I was also thinking how will we decide the lookup table values for
> >>>>>> each vendor ?
> >>>>> 
> >>>>> I wasn't thinking we would.  I was thinking I'd do a simple exception
> >>>>> for the Atmel and nothing else.  I don't think my Nuvoton is in any way
> >>>>> characteristic.  Indeed my pluggable TPM rainbow bridge system works
> >>>>> just fine with a Nuvoton and the current timings.
> >>>>> 
> >>>>> We can add additional exceptions if they actually turn up.
> >>>> 
> >>>> I'd add a table and fallback.
> >>>> 
> >>> 
> >>> Hi folks,
> >>> 
> >>> I want to follow up this a bit and check whether we reached a consensus 
> >>> on how to fix the timeout issue for Atmel chip.
> >>> 
> >>> Should we revert the changes or introduce the lookup table for chips.
> >>> 
> >>> Is there anything I can help from Rubrik side.
> >>> 
> >>> Thanks
> >>> Hao
> >> 
> >> There is nothing to revert as the previous was not applied but I'm
> >> of course ready to review any new attempts.
> >> 
> > 
> > Hi Jarkko,
> > 
> > By “revert” I meant we revert the timeout value changes by applying
> > the patch I proposed, as the timeout value discussed does cause issues.
> > 
> > Why don’t we apply the patch and improve the perf in the way of not
> > breaking TPMs ? 
> > 
> > Hao
> 
> Hi Jarkko and folks,
> 
> It’s being a while since our last discussion. I want to push a fix in the upstream for ateml chip. 
> It looks like we currently have following choices:
> 1.  generic fix for all vendors: have a lookup table for sleep time of wait_for_tpm_stat 
>   (i.e. TPM_TIMEOUT_WAIT_STAT in my proposed patch) 
> 2.  quick fix for the regression: change the sleep time of wait_for_tpm_stat back to 15ms.
>   It is the current proposed patch
> 3. Fix regression by making exception for ateml chip.  
> 
> Should we reach consensus on which one we want to pursue before dig
> into implementation of the patch? In my opinion, I prefer to fix the
> regression with 2, and then pursue 1 as long-term solution. 3 is
> hacky.

What does option 1 fix for *all* vendors?

> Let me know what do you guys think
> 
> Hao

/Jarkko

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH] Fix Atmel TPM crash caused by too frequent queries
  2020-10-18  5:20                                         ` Hao Wu
@ 2020-11-14  4:39                                           ` Hao Wu
  2020-11-18 21:11                                             ` Jarkko Sakkinen
  0 siblings, 1 reply; 91+ messages in thread
From: Hao Wu @ 2020-11-14  4:39 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: James Bottomley, Nayna, peterhuewe, jgg, arnd, gregkh,
	Hamza Attak, why2jjj.linux, zohar, linux-integrity, Paul Menzel,
	Ken Goldman, Seungyeop Han, Shrihari Kalkar, Anish Jhaveri

> On Oct 17, 2020, at 10:20 PM, Hao Wu <hao.wu@rubrik.com> wrote:
> 
>> On Oct 17, 2020, at 10:09 PM, Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> wrote:
>> 
>> On Fri, Oct 16, 2020 at 11:11:37PM -0700, Hao Wu wrote:
>>>> On Oct 1, 2020, at 4:04 PM, Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> wrote:
>>>> 
>>>> On Thu, Oct 01, 2020 at 11:32:59AM -0700, James Bottomley wrote:
>>>>> On Thu, 2020-10-01 at 14:15 -0400, Nayna wrote:
>>>>>> On 10/1/20 12:53 AM, James Bottomley wrote:
>>>>>>> On Thu, 2020-10-01 at 04:50 +0300, Jarkko Sakkinen wrote:
>>>>>>>> On Wed, Sep 30, 2020 at 03:31:20PM -0700, James Bottomley wrote:
>>>>>>>>> On Thu, 2020-10-01 at 00:09 +0300, Jarkko Sakkinen wrote:
>>>>> [...]
>>>>>>>>>> I also wonder if we could adjust the frequency dynamically.
>>>>>>>>>> I.e. start with optimistic value and lower it until finding
>>>>>>>>>> the sweet spot.
>>>>>>>>> 
>>>>>>>>> The problem is the way this crashes: the TPM seems to be
>>>>>>>>> unrecoverable. If it were recoverable without a hard reset of
>>>>>>>>> the entire machine, we could certainly play around with it.  I
>>>>>>>>> can try alternative mechanisms to see if anything's viable, but
>>>>>>>>> to all intents and purposes, it looks like my TPM simply stops
>>>>>>>>> responding to the TIS interface.
>>>>>>>> 
>>>>>>>> A quickly scraped idea probably with some holes in it but I was
>>>>>>>> thinking something like
>>>>>>>> 
>>>>>>>> 1. Initially set slow value for latency, this could be the
>>>>>>>> original 15 ms.
>>>>>>>> 2. Use this to read TPM_PT_VENDOR_STRING_*.
>>>>>>>> 3. Lookup based vendor string from a fixup table a latency that
>>>>>>>> works
>>>>>>>>  (the fallback latency could be the existing latency).
>>>>>>> 
>>>>>>> Well, yes, that was sort of what I was thinking of doing for the
>>>>>>> Atmel ... except I was thinking of using the TIS VID (16 byte
>>>>>>> assigned vendor ID) which means we can get the information to set
>>>>>>> the timeout before we have to do any TPM operations.
>>>>>> 
>>>>>> I wonder if the timeout issue exists for all TPM commands for the
>>>>>> same manufacturer.  For example, does the ATMEL TPM also crash when 
>>>>>> extending  PCRs ?
>>>>>> 
>>>>>> In addition to defining a per TPM vendor based lookup table for
>>>>>> timeout, would it be a good idea to also define a Kconfig/boot param
>>>>>> option to allow timeout setting.  This will enable to set the timeout
>>>>>> based on the specific use.
>>>>> 
>>>>> I don't think we need go that far (yet).  The timing change has been in
>>>>> upstream since:
>>>>> 
>>>>> commit 424eaf910c329ab06ad03a527ef45dcf6a328f00
>>>>> Author: Nayna Jain <nayna@linux.vnet.ibm.com>
>>>>> Date:   Wed May 16 01:51:25 2018 -0400
>>>>> 
>>>>>  tpm: reduce polling time to usecs for even finer granularity
>>>>> 
>>>>> Which was in the released kernel 4.18: over two years ago.  In all that
>>>>> time we've discovered two problems: mine which looks to be an artifact
>>>>> of an experimental upgrade process in a new nuvoton and the Atmel. 
>>>>> That means pretty much every other TPM simply works with the existing
>>>>> timings
>>>>> 
>>>>>> I was also thinking how will we decide the lookup table values for
>>>>>> each vendor ?
>>>>> 
>>>>> I wasn't thinking we would.  I was thinking I'd do a simple exception
>>>>> for the Atmel and nothing else.  I don't think my Nuvoton is in any way
>>>>> characteristic.  Indeed my pluggable TPM rainbow bridge system works
>>>>> just fine with a Nuvoton and the current timings.
>>>>> 
>>>>> We can add additional exceptions if they actually turn up.
>>>> 
>>>> I'd add a table and fallback.
>>>> 
>>> 
>>> Hi folks,
>>> 
>>> I want to follow up this a bit and check whether we reached a consensus 
>>> on how to fix the timeout issue for Atmel chip.
>>> 
>>> Should we revert the changes or introduce the lookup table for chips.
>>> 
>>> Is there anything I can help from Rubrik side.
>>> 
>>> Thanks
>>> Hao
>> 
>> There is nothing to revert as the previous was not applied but I'm
>> of course ready to review any new attempts.
>> 
> 
> Hi Jarkko,
> 
> By “revert” I meant we revert the timeout value changes by applying
> the patch I proposed, as the timeout value discussed does cause issues.
> 
> Why don’t we apply the patch and improve the perf in the way of not
> breaking TPMs ? 
> 
> Hao

Hi Jarkko and folks,

It’s being a while since our last discussion. I want to push a fix in the upstream for ateml chip. 
It looks like we currently have following choices:
1.  generic fix for all vendors: have a lookup table for sleep time of wait_for_tpm_stat 
  (i.e. TPM_TIMEOUT_WAIT_STAT in my proposed patch) 
2.  quick fix for the regression: change the sleep time of wait_for_tpm_stat back to 15ms.
  It is the current proposed patch
3. Fix regression by making exception for ateml chip.  

Should we reach consensus on which one we want to pursue before dig into implementation
of the patch? In my opinion, I prefer to fix the regression with 2, and then pursue 1 as long-term
solution. 3 is hacky.

Let me know what do you guys think

Hao
 


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH] Fix Atmel TPM crash caused by too frequent queries
  2020-10-18  5:09                                       ` Jarkko Sakkinen
@ 2020-10-18  5:20                                         ` Hao Wu
  2020-11-14  4:39                                           ` Hao Wu
  0 siblings, 1 reply; 91+ messages in thread
From: Hao Wu @ 2020-10-18  5:20 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: James Bottomley, Nayna, peterhuewe, jgg, arnd, gregkh,
	Hamza Attak, why2jjj.linux, zohar, linux-integrity, Paul Menzel,
	Ken Goldman, Seungyeop Han, Shrihari Kalkar, Anish Jhaveri

> On Oct 17, 2020, at 10:09 PM, Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> wrote:
> 
> On Fri, Oct 16, 2020 at 11:11:37PM -0700, Hao Wu wrote:
>>> On Oct 1, 2020, at 4:04 PM, Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> wrote:
>>> 
>>> On Thu, Oct 01, 2020 at 11:32:59AM -0700, James Bottomley wrote:
>>>> On Thu, 2020-10-01 at 14:15 -0400, Nayna wrote:
>>>>> On 10/1/20 12:53 AM, James Bottomley wrote:
>>>>>> On Thu, 2020-10-01 at 04:50 +0300, Jarkko Sakkinen wrote:
>>>>>>> On Wed, Sep 30, 2020 at 03:31:20PM -0700, James Bottomley wrote:
>>>>>>>> On Thu, 2020-10-01 at 00:09 +0300, Jarkko Sakkinen wrote:
>>>> [...]
>>>>>>>>> I also wonder if we could adjust the frequency dynamically.
>>>>>>>>> I.e. start with optimistic value and lower it until finding
>>>>>>>>> the sweet spot.
>>>>>>>> 
>>>>>>>> The problem is the way this crashes: the TPM seems to be
>>>>>>>> unrecoverable. If it were recoverable without a hard reset of
>>>>>>>> the entire machine, we could certainly play around with it.  I
>>>>>>>> can try alternative mechanisms to see if anything's viable, but
>>>>>>>> to all intents and purposes, it looks like my TPM simply stops
>>>>>>>> responding to the TIS interface.
>>>>>>> 
>>>>>>> A quickly scraped idea probably with some holes in it but I was
>>>>>>> thinking something like
>>>>>>> 
>>>>>>> 1. Initially set slow value for latency, this could be the
>>>>>>> original 15 ms.
>>>>>>> 2. Use this to read TPM_PT_VENDOR_STRING_*.
>>>>>>> 3. Lookup based vendor string from a fixup table a latency that
>>>>>>> works
>>>>>>>   (the fallback latency could be the existing latency).
>>>>>> 
>>>>>> Well, yes, that was sort of what I was thinking of doing for the
>>>>>> Atmel ... except I was thinking of using the TIS VID (16 byte
>>>>>> assigned vendor ID) which means we can get the information to set
>>>>>> the timeout before we have to do any TPM operations.
>>>>> 
>>>>> I wonder if the timeout issue exists for all TPM commands for the
>>>>> same manufacturer.  For example, does the ATMEL TPM also crash when 
>>>>> extending  PCRs ?
>>>>> 
>>>>> In addition to defining a per TPM vendor based lookup table for
>>>>> timeout, would it be a good idea to also define a Kconfig/boot param
>>>>> option to allow timeout setting.  This will enable to set the timeout
>>>>> based on the specific use.
>>>> 
>>>> I don't think we need go that far (yet).  The timing change has been in
>>>> upstream since:
>>>> 
>>>> commit 424eaf910c329ab06ad03a527ef45dcf6a328f00
>>>> Author: Nayna Jain <nayna@linux.vnet.ibm.com>
>>>> Date:   Wed May 16 01:51:25 2018 -0400
>>>> 
>>>>   tpm: reduce polling time to usecs for even finer granularity
>>>> 
>>>> Which was in the released kernel 4.18: over two years ago.  In all that
>>>> time we've discovered two problems: mine which looks to be an artifact
>>>> of an experimental upgrade process in a new nuvoton and the Atmel. 
>>>> That means pretty much every other TPM simply works with the existing
>>>> timings
>>>> 
>>>>> I was also thinking how will we decide the lookup table values for
>>>>> each vendor ?
>>>> 
>>>> I wasn't thinking we would.  I was thinking I'd do a simple exception
>>>> for the Atmel and nothing else.  I don't think my Nuvoton is in any way
>>>> characteristic.  Indeed my pluggable TPM rainbow bridge system works
>>>> just fine with a Nuvoton and the current timings.
>>>> 
>>>> We can add additional exceptions if they actually turn up.
>>> 
>>> I'd add a table and fallback.
>>> 
>> 
>> Hi folks,
>> 
>> I want to follow up this a bit and check whether we reached a consensus 
>> on how to fix the timeout issue for Atmel chip.
>> 
>> Should we revert the changes or introduce the lookup table for chips.
>> 
>> Is there anything I can help from Rubrik side.
>> 
>> Thanks
>> Hao
> 
> There is nothing to revert as the previous was not applied but I'm
> of course ready to review any new attempts.
> 

Hi Jarkko,

By “revert” I meant we revert the timeout value changes by applying
the patch I proposed, as the timeout value discussed does cause issues.

Why don’t we apply the patch and improve the perf in the way of not
breaking TPMs ? 

Hao
 


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH] Fix Atmel TPM crash caused by too frequent queries
  2020-10-17  6:11                                     ` Hao Wu
@ 2020-10-18  5:09                                       ` Jarkko Sakkinen
  2020-10-18  5:20                                         ` Hao Wu
  0 siblings, 1 reply; 91+ messages in thread
From: Jarkko Sakkinen @ 2020-10-18  5:09 UTC (permalink / raw)
  To: Hao Wu
  Cc: James Bottomley, Nayna, peterhuewe, jgg, arnd, gregkh,
	Hamza Attak, why2jjj.linux, zohar, linux-integrity, Paul Menzel,
	Ken Goldman, Seungyeop Han, Shrihari Kalkar, Anish Jhaveri

On Fri, Oct 16, 2020 at 11:11:37PM -0700, Hao Wu wrote:
> > On Oct 1, 2020, at 4:04 PM, Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> wrote:
> > 
> > On Thu, Oct 01, 2020 at 11:32:59AM -0700, James Bottomley wrote:
> >> On Thu, 2020-10-01 at 14:15 -0400, Nayna wrote:
> >>> On 10/1/20 12:53 AM, James Bottomley wrote:
> >>>> On Thu, 2020-10-01 at 04:50 +0300, Jarkko Sakkinen wrote:
> >>>>> On Wed, Sep 30, 2020 at 03:31:20PM -0700, James Bottomley wrote:
> >>>>>> On Thu, 2020-10-01 at 00:09 +0300, Jarkko Sakkinen wrote:
> >> [...]
> >>>>>>> I also wonder if we could adjust the frequency dynamically.
> >>>>>>> I.e. start with optimistic value and lower it until finding
> >>>>>>> the sweet spot.
> >>>>>> 
> >>>>>> The problem is the way this crashes: the TPM seems to be
> >>>>>> unrecoverable. If it were recoverable without a hard reset of
> >>>>>> the entire machine, we could certainly play around with it.  I
> >>>>>> can try alternative mechanisms to see if anything's viable, but
> >>>>>> to all intents and purposes, it looks like my TPM simply stops
> >>>>>> responding to the TIS interface.
> >>>>> 
> >>>>> A quickly scraped idea probably with some holes in it but I was
> >>>>> thinking something like
> >>>>> 
> >>>>> 1. Initially set slow value for latency, this could be the
> >>>>> original 15 ms.
> >>>>> 2. Use this to read TPM_PT_VENDOR_STRING_*.
> >>>>> 3. Lookup based vendor string from a fixup table a latency that
> >>>>> works
> >>>>>    (the fallback latency could be the existing latency).
> >>>> 
> >>>> Well, yes, that was sort of what I was thinking of doing for the
> >>>> Atmel ... except I was thinking of using the TIS VID (16 byte
> >>>> assigned vendor ID) which means we can get the information to set
> >>>> the timeout before we have to do any TPM operations.
> >>> 
> >>> I wonder if the timeout issue exists for all TPM commands for the
> >>> same manufacturer.  For example, does the ATMEL TPM also crash when 
> >>> extending  PCRs ?
> >>> 
> >>> In addition to defining a per TPM vendor based lookup table for
> >>> timeout, would it be a good idea to also define a Kconfig/boot param
> >>> option to allow timeout setting.  This will enable to set the timeout
> >>> based on the specific use.
> >> 
> >> I don't think we need go that far (yet).  The timing change has been in
> >> upstream since:
> >> 
> >> commit 424eaf910c329ab06ad03a527ef45dcf6a328f00
> >> Author: Nayna Jain <nayna@linux.vnet.ibm.com>
> >> Date:   Wed May 16 01:51:25 2018 -0400
> >> 
> >>    tpm: reduce polling time to usecs for even finer granularity
> >> 
> >> Which was in the released kernel 4.18: over two years ago.  In all that
> >> time we've discovered two problems: mine which looks to be an artifact
> >> of an experimental upgrade process in a new nuvoton and the Atmel. 
> >> That means pretty much every other TPM simply works with the existing
> >> timings
> >> 
> >>> I was also thinking how will we decide the lookup table values for
> >>> each vendor ?
> >> 
> >> I wasn't thinking we would.  I was thinking I'd do a simple exception
> >> for the Atmel and nothing else.  I don't think my Nuvoton is in any way
> >> characteristic.  Indeed my pluggable TPM rainbow bridge system works
> >> just fine with a Nuvoton and the current timings.
> >> 
> >> We can add additional exceptions if they actually turn up.
> > 
> > I'd add a table and fallback.
> > 
> 
> Hi folks,
> 
> I want to follow up this a bit and check whether we reached a consensus 
> on how to fix the timeout issue for Atmel chip.
> 
> Should we revert the changes or introduce the lookup table for chips.
> 
> Is there anything I can help from Rubrik side.
> 
> Thanks
> Hao

There is nothing to revert as the previous was not applied but I'm
of course ready to review any new attempts.

/Jarkko

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH] Fix Atmel TPM crash caused by too frequent queries
  2020-10-01 23:04                                   ` Jarkko Sakkinen
@ 2020-10-17  6:11                                     ` Hao Wu
  2020-10-18  5:09                                       ` Jarkko Sakkinen
  0 siblings, 1 reply; 91+ messages in thread
From: Hao Wu @ 2020-10-17  6:11 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: James Bottomley, Nayna, peterhuewe, jgg, arnd, gregkh,
	Hamza Attak, why2jjj.linux, zohar, linux-integrity, Paul Menzel,
	Ken Goldman, Seungyeop Han, Shrihari Kalkar, Anish Jhaveri

> On Oct 1, 2020, at 4:04 PM, Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> wrote:
> 
> On Thu, Oct 01, 2020 at 11:32:59AM -0700, James Bottomley wrote:
>> On Thu, 2020-10-01 at 14:15 -0400, Nayna wrote:
>>> On 10/1/20 12:53 AM, James Bottomley wrote:
>>>> On Thu, 2020-10-01 at 04:50 +0300, Jarkko Sakkinen wrote:
>>>>> On Wed, Sep 30, 2020 at 03:31:20PM -0700, James Bottomley wrote:
>>>>>> On Thu, 2020-10-01 at 00:09 +0300, Jarkko Sakkinen wrote:
>> [...]
>>>>>>> I also wonder if we could adjust the frequency dynamically.
>>>>>>> I.e. start with optimistic value and lower it until finding
>>>>>>> the sweet spot.
>>>>>> 
>>>>>> The problem is the way this crashes: the TPM seems to be
>>>>>> unrecoverable. If it were recoverable without a hard reset of
>>>>>> the entire machine, we could certainly play around with it.  I
>>>>>> can try alternative mechanisms to see if anything's viable, but
>>>>>> to all intents and purposes, it looks like my TPM simply stops
>>>>>> responding to the TIS interface.
>>>>> 
>>>>> A quickly scraped idea probably with some holes in it but I was
>>>>> thinking something like
>>>>> 
>>>>> 1. Initially set slow value for latency, this could be the
>>>>> original 15 ms.
>>>>> 2. Use this to read TPM_PT_VENDOR_STRING_*.
>>>>> 3. Lookup based vendor string from a fixup table a latency that
>>>>> works
>>>>>    (the fallback latency could be the existing latency).
>>>> 
>>>> Well, yes, that was sort of what I was thinking of doing for the
>>>> Atmel ... except I was thinking of using the TIS VID (16 byte
>>>> assigned vendor ID) which means we can get the information to set
>>>> the timeout before we have to do any TPM operations.
>>> 
>>> I wonder if the timeout issue exists for all TPM commands for the
>>> same manufacturer.  For example, does the ATMEL TPM also crash when 
>>> extending  PCRs ?
>>> 
>>> In addition to defining a per TPM vendor based lookup table for
>>> timeout, would it be a good idea to also define a Kconfig/boot param
>>> option to allow timeout setting.  This will enable to set the timeout
>>> based on the specific use.
>> 
>> I don't think we need go that far (yet).  The timing change has been in
>> upstream since:
>> 
>> commit 424eaf910c329ab06ad03a527ef45dcf6a328f00
>> Author: Nayna Jain <nayna@linux.vnet.ibm.com>
>> Date:   Wed May 16 01:51:25 2018 -0400
>> 
>>    tpm: reduce polling time to usecs for even finer granularity
>> 
>> Which was in the released kernel 4.18: over two years ago.  In all that
>> time we've discovered two problems: mine which looks to be an artifact
>> of an experimental upgrade process in a new nuvoton and the Atmel. 
>> That means pretty much every other TPM simply works with the existing
>> timings
>> 
>>> I was also thinking how will we decide the lookup table values for
>>> each vendor ?
>> 
>> I wasn't thinking we would.  I was thinking I'd do a simple exception
>> for the Atmel and nothing else.  I don't think my Nuvoton is in any way
>> characteristic.  Indeed my pluggable TPM rainbow bridge system works
>> just fine with a Nuvoton and the current timings.
>> 
>> We can add additional exceptions if they actually turn up.
> 
> I'd add a table and fallback.
> 

Hi folks,

I want to follow up this a bit and check whether we reached a consensus 
on how to fix the timeout issue for Atmel chip.

Should we revert the changes or introduce the lookup table for chips.

Is there anything I can help from Rubrik side.

Thanks
Hao
 


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH] Fix Atmel TPM crash caused by too frequent queries
  2020-10-01 18:32                                 ` James Bottomley
@ 2020-10-01 23:04                                   ` Jarkko Sakkinen
  2020-10-17  6:11                                     ` Hao Wu
  0 siblings, 1 reply; 91+ messages in thread
From: Jarkko Sakkinen @ 2020-10-01 23:04 UTC (permalink / raw)
  To: James Bottomley
  Cc: Nayna, Hao Wu, peterhuewe, jgg, arnd, gregkh, Hamza Attak,
	why2jjj.linux, zohar, linux-integrity, Paul Menzel, Ken Goldman,
	Seungyeop Han, Shrihari Kalkar, Anish Jhaveri

On Thu, Oct 01, 2020 at 11:32:59AM -0700, James Bottomley wrote:
> On Thu, 2020-10-01 at 14:15 -0400, Nayna wrote:
> > On 10/1/20 12:53 AM, James Bottomley wrote:
> > > On Thu, 2020-10-01 at 04:50 +0300, Jarkko Sakkinen wrote:
> > > > On Wed, Sep 30, 2020 at 03:31:20PM -0700, James Bottomley wrote:
> > > > > On Thu, 2020-10-01 at 00:09 +0300, Jarkko Sakkinen wrote:
> [...]
> > > > > > I also wonder if we could adjust the frequency dynamically.
> > > > > > I.e. start with optimistic value and lower it until finding
> > > > > > the sweet spot.
> > > > >  
> > > > > The problem is the way this crashes: the TPM seems to be
> > > > > unrecoverable. If it were recoverable without a hard reset of
> > > > > the entire machine, we could certainly play around with it.  I
> > > > > can try alternative mechanisms to see if anything's viable, but
> > > > > to all intents and purposes, it looks like my TPM simply stops
> > > > > responding to the TIS interface.
> > > >  
> > > > A quickly scraped idea probably with some holes in it but I was
> > > > thinking something like
> > > > 
> > > > 1. Initially set slow value for latency, this could be the
> > > > original 15 ms.
> > > > 2. Use this to read TPM_PT_VENDOR_STRING_*.
> > > > 3. Lookup based vendor string from a fixup table a latency that
> > > > works
> > > >     (the fallback latency could be the existing latency).
> > >  
> > > Well, yes, that was sort of what I was thinking of doing for the
> > > Atmel ... except I was thinking of using the TIS VID (16 byte
> > > assigned vendor ID) which means we can get the information to set
> > > the timeout before we have to do any TPM operations.
> > 
> > I wonder if the timeout issue exists for all TPM commands for the
> > same manufacturer.  For example, does the ATMEL TPM also crash when 
> > extending  PCRs ?
> > 
> > In addition to defining a per TPM vendor based lookup table for
> > timeout, would it be a good idea to also define a Kconfig/boot param
> > option to allow timeout setting.  This will enable to set the timeout
> > based on the specific use.
> 
> I don't think we need go that far (yet).  The timing change has been in
> upstream since:
> 
> commit 424eaf910c329ab06ad03a527ef45dcf6a328f00
> Author: Nayna Jain <nayna@linux.vnet.ibm.com>
> Date:   Wed May 16 01:51:25 2018 -0400
> 
>     tpm: reduce polling time to usecs for even finer granularity
>  
> Which was in the released kernel 4.18: over two years ago.  In all that
> time we've discovered two problems: mine which looks to be an artifact
> of an experimental upgrade process in a new nuvoton and the Atmel. 
> That means pretty much every other TPM simply works with the existing
> timings
> 
> > I was also thinking how will we decide the lookup table values for
> > each vendor ?
> 
> I wasn't thinking we would.  I was thinking I'd do a simple exception
> for the Atmel and nothing else.  I don't think my Nuvoton is in any way
> characteristic.  Indeed my pluggable TPM rainbow bridge system works
> just fine with a Nuvoton and the current timings.
> 
> We can add additional exceptions if they actually turn up.

I'd add a table and fallback.

/Jarkko

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH] Fix Atmel TPM crash caused by too frequent queries
  2020-10-01 18:15                               ` Nayna
@ 2020-10-01 18:32                                 ` James Bottomley
  2020-10-01 23:04                                   ` Jarkko Sakkinen
  0 siblings, 1 reply; 91+ messages in thread
From: James Bottomley @ 2020-10-01 18:32 UTC (permalink / raw)
  To: Nayna, Jarkko Sakkinen
  Cc: Hao Wu, peterhuewe, jgg, arnd, gregkh, Hamza Attak,
	why2jjj.linux, zohar, linux-integrity, Paul Menzel, Ken Goldman,
	Seungyeop Han, Shrihari Kalkar, Anish Jhaveri

On Thu, 2020-10-01 at 14:15 -0400, Nayna wrote:
> On 10/1/20 12:53 AM, James Bottomley wrote:
> > On Thu, 2020-10-01 at 04:50 +0300, Jarkko Sakkinen wrote:
> > > On Wed, Sep 30, 2020 at 03:31:20PM -0700, James Bottomley wrote:
> > > > On Thu, 2020-10-01 at 00:09 +0300, Jarkko Sakkinen wrote:
[...]
> > > > > I also wonder if we could adjust the frequency dynamically.
> > > > > I.e. start with optimistic value and lower it until finding
> > > > > the sweet spot.
> > > >  
> > > > The problem is the way this crashes: the TPM seems to be
> > > > unrecoverable. If it were recoverable without a hard reset of
> > > > the entire machine, we could certainly play around with it.  I
> > > > can try alternative mechanisms to see if anything's viable, but
> > > > to all intents and purposes, it looks like my TPM simply stops
> > > > responding to the TIS interface.
> > >  
> > > A quickly scraped idea probably with some holes in it but I was
> > > thinking something like
> > > 
> > > 1. Initially set slow value for latency, this could be the
> > > original 15 ms.
> > > 2. Use this to read TPM_PT_VENDOR_STRING_*.
> > > 3. Lookup based vendor string from a fixup table a latency that
> > > works
> > >     (the fallback latency could be the existing latency).
> >  
> > Well, yes, that was sort of what I was thinking of doing for the
> > Atmel ... except I was thinking of using the TIS VID (16 byte
> > assigned vendor ID) which means we can get the information to set
> > the timeout before we have to do any TPM operations.
> 
> I wonder if the timeout issue exists for all TPM commands for the
> same manufacturer.  For example, does the ATMEL TPM also crash when 
> extending  PCRs ?
> 
> In addition to defining a per TPM vendor based lookup table for
> timeout, would it be a good idea to also define a Kconfig/boot param
> option to allow timeout setting.  This will enable to set the timeout
> based on the specific use.

I don't think we need go that far (yet).  The timing change has been in
upstream since:

commit 424eaf910c329ab06ad03a527ef45dcf6a328f00
Author: Nayna Jain <nayna@linux.vnet.ibm.com>
Date:   Wed May 16 01:51:25 2018 -0400

    tpm: reduce polling time to usecs for even finer granularity
 
Which was in the released kernel 4.18: over two years ago.  In all that
time we've discovered two problems: mine which looks to be an artifact
of an experimental upgrade process in a new nuvoton and the Atmel. 
That means pretty much every other TPM simply works with the existing
timings

> I was also thinking how will we decide the lookup table values for
> each vendor ?

I wasn't thinking we would.  I was thinking I'd do a simple exception
for the Atmel and nothing else.  I don't think my Nuvoton is in any way
characteristic.  Indeed my pluggable TPM rainbow bridge system works
just fine with a Nuvoton and the current timings.

We can add additional exceptions if they actually turn up.

James



^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH] Fix Atmel TPM crash caused by too frequent queries
  2020-10-01  4:53                             ` James Bottomley
@ 2020-10-01 18:15                               ` Nayna
  2020-10-01 18:32                                 ` James Bottomley
  0 siblings, 1 reply; 91+ messages in thread
From: Nayna @ 2020-10-01 18:15 UTC (permalink / raw)
  To: James Bottomley, Jarkko Sakkinen
  Cc: Hao Wu, peterhuewe, jgg, arnd, gregkh, Hamza Attak,
	why2jjj.linux, zohar, linux-integrity, Paul Menzel, Ken Goldman,
	Seungyeop Han, Shrihari Kalkar, Anish Jhaveri


On 10/1/20 12:53 AM, James Bottomley wrote:
> On Thu, 2020-10-01 at 04:50 +0300, Jarkko Sakkinen wrote:
>> On Wed, Sep 30, 2020 at 03:31:20PM -0700, James Bottomley wrote:
>>> On Thu, 2020-10-01 at 00:09 +0300, Jarkko Sakkinen wrote:
>>>> On Wed, Sep 30, 2020 at 01:48:15PM -0700, James Bottomley wrote:
>>>>> On Wed, 2020-09-30 at 18:37 +0300, Jarkko Sakkinen wrote:
>>>>>> On Wed, Sep 30, 2020 at 07:54:58AM -0700, James Bottomley
>>>>>> wrote:
>>>>>>> On Wed, 2020-09-30 at 05:16 +0300, Jarkko Sakkinen wrote:
>>>>>>>> On Mon, Sep 28, 2020 at 03:11:39PM -0700, James Bottomley
>>>>>>>> wrote:
>>>>>>>>> On Sun, 2020-09-27 at 22:59 -0700, Hao Wu wrote:
>>>>>>>>> [...]
>>>>>>>>>>> However, there is another possibility: it's
>>>>>>>>>>> something to do with the byte read; I notice you
>>>>>>>>>>> don't require the same slowdown for the burst count
>>>>>>>>>>> read, which actually reads the status register and
>>>>>>>>>>> burst count as a read32.  If that really is the
>>>>>>>>>>> case, for the atmel would substituting a read32 and
>>>>>>>>>>> just throwing the upper bytes away in
>>>>>>>>>>> tpm_tis_status() allow us to keep the current
>>>>>>>>>>> timings?  I can actually try doing this and see if
>>>>>>>>>>> it fixes my nuvoton.
>>>>>>>>>> If would be helpful if you can find the solution
>>>>>>>>>> without reducing performance. I think it is a
>>>>>>>>>> separate problem to address though. Maybe not worth
>>>>>>>>>> to mix them in the same fix.
>>>>>>>>> Well, if it works, no other fix is needed.
>>>>>>>>>
>>>>>>>>> This is what I'm currently trying out on my nuvoton
>>>>>>>>> with the timings reverted to being those in the vanilla
>>>>>>>>> kernel.  So far it hasn't crashed, but I haven't run it
>>>>>>>>> for long enough to be sure yet.
>>>>>>>>>
>>>>>>>>> James
>>>>>>>> OK, so the bus does not like one byte reads but prefers
>>>>>>>> full (32-bit) word reads? I.e. what's the context?
>>>>>>> It's not supported by anything in the spec just empirical
>>>>>>> observation.  However, the spec says the status register is
>>>>>>> 24 bits: the upper 16 being the burst count.  When we read
>>>>>>> the whole status register, including the burst count, we do
>>>>>>> a read32. I observed that the elongated timing was only
>>>>>>> added for the read8 code not the read32 which supports the
>>>>>>> theory that the former causes the Atmel to crash but the
>>>>>>> latter doesn't.  Of course it's always possible that
>>>>>>> probabilistically the Atmel is going to crash on the burst
>>>>>>> count read, but that's exercised far less than the status
>>>>>>> only read.
>>>>>> This paragraph is good enough explanation for me. Can you
>>>>>> include it to the final commit as soon as we hear how your
>>>>>> fix works for Hao?
>>>>> Sure.  I'm afraid I have to report that it didn't work for
>>>>> me.  My Nuvoton is definitely annoyed by the frequency of the
>>>>> prodding rather than the register width.
>>>> Sorry, this might have been stated at some point but what type of
>>>> bus is it connected with?
>>> It's hard to tell: this is my Dell Laptop, but I'd have to bet LPC.
>>>
>>>> Does it help in any way to tune the frequency?
>>> specific memory mapped address and all the conversion to the LPC
>>> back end is done by memory read/write operations.  The TPM itself
>>> has a clock but doesn't give the TIS interface software control.
>> Some TPM's use tpm_tis_spi instead of MMIO.
> Yes, but I'm fairly certain mine's not SPI.
>
>>>> I also wonder if we could adjust the frequency dynamically. I.e.
>>>> start with optimistic value and lower it until finding the sweet
>>>> spot.
>>> The problem is the way this crashes: the TPM seems to be
>>> unrecoverable. If it were recoverable without a hard reset of the
>>> entire machine, we could certainly play around with it.  I can try
>>> alternative mechanisms to see if anything's viable, but to all
>>> intents and purposes, it looks like my TPM simply stops responding
>>> to the TIS interface.
>> A quickly scraped idea probably with some holes in it but I was
>> thinking something like
>>
>> 1. Initially set slow value for latency, this could be the original
>> 15 ms.
>> 2. Use this to read TPM_PT_VENDOR_STRING_*.
>> 3. Lookup based vendor string from a fixup table a latency that works
>>     (the fallback latency could be the existing latency).
> Well, yes, that was sort of what I was thinking of doing for the Atmel
> ... except I was thinking of using the TIS VID (16 byte assigned vendor
> ID) which means we can get the information to set the timeout before we
> have to do any TPM operations.


I wonder if the timeout issue exists for all TPM commands for the same 
manufacturer.  For example, does the ATMEL TPM also crash when 
extending  PCRs ?

In addition to defining a per TPM vendor based lookup table for timeout, 
would it be a good idea to also define a Kconfig/boot param option to 
allow timeout setting.  This will enable to set the timeout based on the 
specific use.

I was also thinking how will we decide the lookup table values for each 
vendor ?

Thanks & Regards,

       - Nayna


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH] Fix Atmel TPM crash caused by too frequent queries
       [not found]                       ` <EA1EE8F8-F054-4E1B-B830-231398D33CB8@rubrik.com>
@ 2020-10-01 14:16                         ` Mimi Zohar
  0 siblings, 0 replies; 91+ messages in thread
From: Mimi Zohar @ 2020-10-01 14:16 UTC (permalink / raw)
  To: Hao Wu, Jarkko Sakkinen, James Bottomley
  Cc: Nayna Jain, peterhuewe, jgg, arnd, gregkh, Hamza Attak,
	why2jjj.linux, zohar, linux-integrity, Paul Menzel, Ken Goldman,
	Seungyeop Han, Shrihari Kalkar, Anish Jhaveri

Hi Hao,

When posting to the mailing list, please respond using plain text and
inline/bottom post.

On Wed, 2020-09-30 at 22:26 -0700, Hao Wu wrote:
> 7.1
> 
> - Jarkko’s proposal: Using msleep(TPM_TIMEOUT) for wait_for_tpm_stat(), resolve
>   the Atmel crash as we expect.  
> 
> ---
>  drivers/char/tpm/tpm_tis_core.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
> index 65ab1b027949..7dd9bcff542d 100644
> --- a/drivers/char/tpm/tpm_tis_core.c
> +++ b/drivers/char/tpm/tpm_tis_core.c
> @@ -80,8 +80,9 @@ static int wait_for_tpm_stat(struct tpm_chip *chip, u8 mask,
>  		}
>  	} else {
>  		do {
> -			usleep_range(TPM_TIMEOUT_USECS_MIN,
> -				     TPM_TIMEOUT_USECS_MAX);
> +            msleep(TPM_TIMEOUT);
> +			// usleep_range(TPM_TIMEOUT_USECS_MIN,
> +			//	     TPM_TIMEOUT_USECS_MAX);
>  			status = chip->ops->status(chip);
>  			if ((status & mask) == mask)
>  				return 0;
> --
> 2.17.1
> 
> But I think tpm_msleep(15) is still the right way to go.

Using msleep is definitely not the right way of going.   Please refer
to commit a233a0289cf9 ("tpm: msleep() delays - replace with
usleep_range() in i2c nuvoton driver") for a detailed explanation.

thanks,

Mimi


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH] Fix Atmel TPM crash caused by too frequent queries
  2020-10-01  1:50                           ` Jarkko Sakkinen
@ 2020-10-01  4:53                             ` James Bottomley
  2020-10-01 18:15                               ` Nayna
  0 siblings, 1 reply; 91+ messages in thread
From: James Bottomley @ 2020-10-01  4:53 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: Hao Wu, peterhuewe, jgg, arnd, gregkh, Hamza Attak, nayna,
	why2jjj.linux, zohar, linux-integrity, Paul Menzel, Ken Goldman,
	Seungyeop Han, Shrihari Kalkar, Anish Jhaveri

On Thu, 2020-10-01 at 04:50 +0300, Jarkko Sakkinen wrote:
> On Wed, Sep 30, 2020 at 03:31:20PM -0700, James Bottomley wrote:
> > On Thu, 2020-10-01 at 00:09 +0300, Jarkko Sakkinen wrote:
> > > On Wed, Sep 30, 2020 at 01:48:15PM -0700, James Bottomley wrote:
> > > > On Wed, 2020-09-30 at 18:37 +0300, Jarkko Sakkinen wrote:
> > > > > On Wed, Sep 30, 2020 at 07:54:58AM -0700, James Bottomley
> > > > > wrote:
> > > > > > On Wed, 2020-09-30 at 05:16 +0300, Jarkko Sakkinen wrote:
> > > > > > > On Mon, Sep 28, 2020 at 03:11:39PM -0700, James Bottomley
> > > > > > > wrote:
> > > > > > > > On Sun, 2020-09-27 at 22:59 -0700, Hao Wu wrote:
> > > > > > > > [...]
> > > > > > > > > > However, there is another possibility: it's
> > > > > > > > > > something to do with the byte read; I notice you
> > > > > > > > > > don't require the same slowdown for the burst count
> > > > > > > > > > read, which actually reads the status register and
> > > > > > > > > > burst count as a read32.  If that really is the
> > > > > > > > > > case, for the atmel would substituting a read32 and
> > > > > > > > > > just throwing the upper bytes away in
> > > > > > > > > > tpm_tis_status() allow us to keep the current
> > > > > > > > > > timings?  I can actually try doing this and see if
> > > > > > > > > > it fixes my nuvoton.
> > > > > > > > > 
> > > > > > > > > If would be helpful if you can find the solution
> > > > > > > > > without reducing performance. I think it is a
> > > > > > > > > separate problem to address though. Maybe not worth
> > > > > > > > > to mix them in the same fix.
> > > > > > > > 
> > > > > > > > Well, if it works, no other fix is needed.
> > > > > > > > 
> > > > > > > > This is what I'm currently trying out on my nuvoton
> > > > > > > > with the timings reverted to being those in the vanilla
> > > > > > > > kernel.  So far it hasn't crashed, but I haven't run it
> > > > > > > > for long enough to be sure yet.
> > > > > > > > 
> > > > > > > > James
> > > > > > > 
> > > > > > > OK, so the bus does not like one byte reads but prefers
> > > > > > > full (32-bit) word reads? I.e. what's the context?
> > > > > > 
> > > > > > It's not supported by anything in the spec just empirical
> > > > > > observation.  However, the spec says the status register is
> > > > > > 24 bits: the upper 16 being the burst count.  When we read
> > > > > > the whole status register, including the burst count, we do
> > > > > > a read32. I observed that the elongated timing was only
> > > > > > added for the read8 code not the read32 which supports the
> > > > > > theory that the former causes the Atmel to crash but the
> > > > > > latter doesn't.  Of course it's always possible that
> > > > > > probabilistically the Atmel is going to crash on the burst
> > > > > > count read, but that's exercised far less than the status
> > > > > > only read.
> > > > > 
> > > > > This paragraph is good enough explanation for me. Can you
> > > > > include it to the final commit as soon as we hear how your
> > > > > fix works for Hao?
> > > > 
> > > > Sure.  I'm afraid I have to report that it didn't work for
> > > > me.  My Nuvoton is definitely annoyed by the frequency of the
> > > > prodding rather than the register width.
> > > 
> > > Sorry, this might have been stated at some point but what type of
> > > bus is it connected with?
> > 
> > It's hard to tell: this is my Dell Laptop, but I'd have to bet LPC.
> > 
> > > Does it help in any way to tune the frequency?
> > 
> > specific memory mapped address and all the conversion to the LPC
> > back end is done by memory read/write operations.  The TPM itself
> > has a clock but doesn't give the TIS interface software control.
> 
> Some TPM's use tpm_tis_spi instead of MMIO.

Yes, but I'm fairly certain mine's not SPI.

> > > I also wonder if we could adjust the frequency dynamically. I.e.
> > > start with optimistic value and lower it until finding the sweet
> > > spot.
> > 
> > The problem is the way this crashes: the TPM seems to be
> > unrecoverable. If it were recoverable without a hard reset of the
> > entire machine, we could certainly play around with it.  I can try
> > alternative mechanisms to see if anything's viable, but to all
> > intents and purposes, it looks like my TPM simply stops responding
> > to the TIS interface.
> 
> A quickly scraped idea probably with some holes in it but I was
> thinking something like
> 
> 1. Initially set slow value for latency, this could be the original
> 15 ms.
> 2. Use this to read TPM_PT_VENDOR_STRING_*.
> 3. Lookup based vendor string from a fixup table a latency that works
>    (the fallback latency could be the existing latency).

Well, yes, that was sort of what I was thinking of doing for the Atmel
... except I was thinking of using the TIS VID (16 byte assigned vendor
ID) which means we can get the information to set the timeout before we
have to do any TPM operations.

> 4. Set the legit latency.

James



^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH] Fix Atmel TPM crash caused by too frequent queries
  2020-09-30 22:31                         ` James Bottomley
@ 2020-10-01  1:50                           ` Jarkko Sakkinen
  2020-10-01  4:53                             ` James Bottomley
  0 siblings, 1 reply; 91+ messages in thread
From: Jarkko Sakkinen @ 2020-10-01  1:50 UTC (permalink / raw)
  To: James Bottomley
  Cc: Hao Wu, peterhuewe, jgg, arnd, gregkh, Hamza Attak, nayna,
	why2jjj.linux, zohar, linux-integrity, Paul Menzel, Ken Goldman,
	Seungyeop Han, Shrihari Kalkar, Anish Jhaveri

On Wed, Sep 30, 2020 at 03:31:20PM -0700, James Bottomley wrote:
> On Thu, 2020-10-01 at 00:09 +0300, Jarkko Sakkinen wrote:
> > On Wed, Sep 30, 2020 at 01:48:15PM -0700, James Bottomley wrote:
> > > On Wed, 2020-09-30 at 18:37 +0300, Jarkko Sakkinen wrote:
> > > > On Wed, Sep 30, 2020 at 07:54:58AM -0700, James Bottomley wrote:
> > > > > On Wed, 2020-09-30 at 05:16 +0300, Jarkko Sakkinen wrote:
> > > > > > On Mon, Sep 28, 2020 at 03:11:39PM -0700, James Bottomley
> > > > > > wrote:
> > > > > > > On Sun, 2020-09-27 at 22:59 -0700, Hao Wu wrote:
> > > > > > > [...]
> > > > > > > > > However, there is another possibility: it's something
> > > > > > > > > to do
> > > > > > > > > with the byte read; I notice you don't require the same
> > > > > > > > > slowdown for the burst count read, which actually reads
> > > > > > > > > the
> > > > > > > > > status register and burst count as a read32.  If that
> > > > > > > > > really is the case, for the atmel would substituting a
> > > > > > > > > read32 and just throwing the upper bytes away in
> > > > > > > > > tpm_tis_status() allow us to keep the current
> > > > > > > > > timings?  I
> > > > > > > > > can actually try doing this and see if it fixes my
> > > > > > > > > nuvoton.
> > > > > > > > 
> > > > > > > > If would be helpful if you can find the solution without
> > > > > > > > reducing performance. I think it is a separate problem to
> > > > > > > > address though. Maybe not worth to mix them in the same
> > > > > > > > fix.
> > > > > > > 
> > > > > > > Well, if it works, no other fix is needed.
> > > > > > > 
> > > > > > > This is what I'm currently trying out on my nuvoton with
> > > > > > > the
> > > > > > > timings reverted to being those in the vanilla kernel.  So
> > > > > > > far
> > > > > > > it hasn't crashed, but I haven't run it for long enough to
> > > > > > > be
> > > > > > > sure yet.
> > > > > > > 
> > > > > > > James
> > > > > > 
> > > > > > OK, so the bus does not like one byte reads but prefers full
> > > > > > (32-
> > > > > > bit) word reads? I.e. what's the context?
> > > > > 
> > > > > It's not supported by anything in the spec just empirical
> > > > > observation.  However, the spec says the status register is 24
> > > > > bits: the upper 16 being the burst count.  When we read the
> > > > > whole
> > > > > status register, including the burst count, we do a read32. I
> > > > > observed that the elongated timing was only added for the read8
> > > > > code not the read32 which supports the theory that the former
> > > > > causes the Atmel to crash but the latter doesn't.  Of course
> > > > > it's
> > > > > always possible that probabilistically the Atmel is going to
> > > > > crash
> > > > > on the burst count read, but that's exercised far less than the
> > > > > status only read.
> > > > 
> > > > This paragraph is good enough explanation for me. Can you include
> > > > it
> > > > to the final commit as soon as we hear how your fix works for
> > > > Hao?
> > > 
> > > Sure.  I'm afraid I have to report that it didn't work for me.  My
> > > Nuvoton is definitely annoyed by the frequency of the prodding
> > > rather
> > > than the register width.
> > 
> > Sorry, this might have been stated at some point but what type of bus
> > is it connected with?
> 
> It's hard to tell: this is my Dell Laptop, but I'd have to bet LPC.
> 
> > Does it help in any way to tune the frequency?
> 
> Of the bus?  We simply don't have access: a TIS TPM is projected at a
> specific memory mapped address and all the conversion to the LPC back
> end is done by memory read/write operations.  The TPM itself has a
> clock but doesn't give the TIS interface software control.

Some TPM's use tpm_tis_spi instead of MMIO.

> > I also wonder if we could adjust the frequency dynamically. I.e.
> > start with optimistic value and lower it until finding the sweet
> > spot.
> 
> The problem is the way this crashes: the TPM seems to be unrecoverable.
> If it were recoverable without a hard reset of the entire machine, we
> could certainly play around with it.  I can try alternative mechanisms
> to see if anything's viable, but to all intents and purposes, it looks
> like my TPM simply stops responding to the TIS interface.

A quickly scraped idea probably with some holes in it but I was
thinking something like

1. Initially set slow value for latency, this could be the original 15
   ms.
2. Use this to read TPM_PT_VENDOR_STRING_*.
3. Lookup based vendor string from a fixup table a latency that works
   (the fallback latency could be the existing latency).
4. Set the legit latency.

/Jarkko

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH] Fix Atmel TPM crash caused by too frequent queries
  2020-09-30 21:09                       ` Jarkko Sakkinen
@ 2020-09-30 22:31                         ` James Bottomley
  2020-10-01  1:50                           ` Jarkko Sakkinen
  0 siblings, 1 reply; 91+ messages in thread
From: James Bottomley @ 2020-09-30 22:31 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: Hao Wu, peterhuewe, jgg, arnd, gregkh, Hamza Attak, nayna,
	why2jjj.linux, zohar, linux-integrity, Paul Menzel, Ken Goldman,
	Seungyeop Han, Shrihari Kalkar, Anish Jhaveri

On Thu, 2020-10-01 at 00:09 +0300, Jarkko Sakkinen wrote:
> On Wed, Sep 30, 2020 at 01:48:15PM -0700, James Bottomley wrote:
> > On Wed, 2020-09-30 at 18:37 +0300, Jarkko Sakkinen wrote:
> > > On Wed, Sep 30, 2020 at 07:54:58AM -0700, James Bottomley wrote:
> > > > On Wed, 2020-09-30 at 05:16 +0300, Jarkko Sakkinen wrote:
> > > > > On Mon, Sep 28, 2020 at 03:11:39PM -0700, James Bottomley
> > > > > wrote:
> > > > > > On Sun, 2020-09-27 at 22:59 -0700, Hao Wu wrote:
> > > > > > [...]
> > > > > > > > However, there is another possibility: it's something
> > > > > > > > to do
> > > > > > > > with the byte read; I notice you don't require the same
> > > > > > > > slowdown for the burst count read, which actually reads
> > > > > > > > the
> > > > > > > > status register and burst count as a read32.  If that
> > > > > > > > really is the case, for the atmel would substituting a
> > > > > > > > read32 and just throwing the upper bytes away in
> > > > > > > > tpm_tis_status() allow us to keep the current
> > > > > > > > timings?  I
> > > > > > > > can actually try doing this and see if it fixes my
> > > > > > > > nuvoton.
> > > > > > > 
> > > > > > > If would be helpful if you can find the solution without
> > > > > > > reducing performance. I think it is a separate problem to
> > > > > > > address though. Maybe not worth to mix them in the same
> > > > > > > fix.
> > > > > > 
> > > > > > Well, if it works, no other fix is needed.
> > > > > > 
> > > > > > This is what I'm currently trying out on my nuvoton with
> > > > > > the
> > > > > > timings reverted to being those in the vanilla kernel.  So
> > > > > > far
> > > > > > it hasn't crashed, but I haven't run it for long enough to
> > > > > > be
> > > > > > sure yet.
> > > > > > 
> > > > > > James
> > > > > 
> > > > > OK, so the bus does not like one byte reads but prefers full
> > > > > (32-
> > > > > bit) word reads? I.e. what's the context?
> > > > 
> > > > It's not supported by anything in the spec just empirical
> > > > observation.  However, the spec says the status register is 24
> > > > bits: the upper 16 being the burst count.  When we read the
> > > > whole
> > > > status register, including the burst count, we do a read32. I
> > > > observed that the elongated timing was only added for the read8
> > > > code not the read32 which supports the theory that the former
> > > > causes the Atmel to crash but the latter doesn't.  Of course
> > > > it's
> > > > always possible that probabilistically the Atmel is going to
> > > > crash
> > > > on the burst count read, but that's exercised far less than the
> > > > status only read.
> > > 
> > > This paragraph is good enough explanation for me. Can you include
> > > it
> > > to the final commit as soon as we hear how your fix works for
> > > Hao?
> > 
> > Sure.  I'm afraid I have to report that it didn't work for me.  My
> > Nuvoton is definitely annoyed by the frequency of the prodding
> > rather
> > than the register width.
> 
> Sorry, this might have been stated at some point but what type of bus
> is it connected with?

It's hard to tell: this is my Dell Laptop, but I'd have to bet LPC.

> Does it help in any way to tune the frequency?

Of the bus?  We simply don't have access: a TIS TPM is projected at a
specific memory mapped address and all the conversion to the LPC back
end is done by memory read/write operations.  The TPM itself has a
clock but doesn't give the TIS interface software control.

> I also wonder if we could adjust the frequency dynamically. I.e.
> start with optimistic value and lower it until finding the sweet
> spot.

The problem is the way this crashes: the TPM seems to be unrecoverable.
If it were recoverable without a hard reset of the entire machine, we
could certainly play around with it.  I can try alternative mechanisms
to see if anything's viable, but to all intents and purposes, it looks
like my TPM simply stops responding to the TIS interface.

James



^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH] Fix Atmel TPM crash caused by too frequent queries
  2020-09-30 20:48                     ` James Bottomley
@ 2020-09-30 21:09                       ` Jarkko Sakkinen
  2020-09-30 22:31                         ` James Bottomley
  0 siblings, 1 reply; 91+ messages in thread
From: Jarkko Sakkinen @ 2020-09-30 21:09 UTC (permalink / raw)
  To: James Bottomley
  Cc: Hao Wu, peterhuewe, jgg, arnd, gregkh, Hamza Attak, nayna,
	why2jjj.linux, zohar, linux-integrity, Paul Menzel, Ken Goldman,
	Seungyeop Han, Shrihari Kalkar, Anish Jhaveri

On Wed, Sep 30, 2020 at 01:48:15PM -0700, James Bottomley wrote:
> On Wed, 2020-09-30 at 18:37 +0300, Jarkko Sakkinen wrote:
> > On Wed, Sep 30, 2020 at 07:54:58AM -0700, James Bottomley wrote:
> > > On Wed, 2020-09-30 at 05:16 +0300, Jarkko Sakkinen wrote:
> > > > On Mon, Sep 28, 2020 at 03:11:39PM -0700, James Bottomley wrote:
> > > > > On Sun, 2020-09-27 at 22:59 -0700, Hao Wu wrote:
> > > > > [...]
> > > > > > > However, there is another possibility: it's something to do
> > > > > > > with the byte read; I notice you don't require the same
> > > > > > > slowdown for the burst count read, which actually reads the
> > > > > > > status register and burst count as a read32.  If that
> > > > > > > really is the case, for the atmel would substituting a
> > > > > > > read32 and just throwing the upper bytes away in
> > > > > > > tpm_tis_status() allow us to keep the current timings?  I
> > > > > > > can actually try doing this and see if it fixes my nuvoton.
> > > > > > 
> > > > > > If would be helpful if you can find the solution without
> > > > > > reducing performance. I think it is a separate problem to
> > > > > > address though. Maybe not worth to mix them in the same fix.
> > > > > 
> > > > > Well, if it works, no other fix is needed.
> > > > > 
> > > > > This is what I'm currently trying out on my nuvoton with the
> > > > > timings reverted to being those in the vanilla kernel.  So far
> > > > > it hasn't crashed, but I haven't run it for long enough to be
> > > > > sure yet.
> > > > > 
> > > > > James
> > > > 
> > > > OK, so the bus does not like one byte reads but prefers full (32-
> > > > bit) word reads? I.e. what's the context?
> > > 
> > > It's not supported by anything in the spec just empirical
> > > observation.  However, the spec says the status register is 24
> > > bits: the upper 16 being the burst count.  When we read the whole
> > > status register, including the burst count, we do a read32. I
> > > observed that the elongated timing was only added for the read8
> > > code not the read32 which supports the theory that the former
> > > causes the Atmel to crash but the latter doesn't.  Of course it's
> > > always possible that probabilistically the Atmel is going to crash
> > > on the burst count read, but that's exercised far less than the
> > > status only read.
> > 
> > This paragraph is good enough explanation for me. Can you include it
> > to the final commit as soon as we hear how your fix works for Hao?
> 
> Sure.  I'm afraid I have to report that it didn't work for me.  My
> Nuvoton is definitely annoyed by the frequency of the prodding rather
> than the register width.

Sorry, this might have been stated at some point but what type of bus
is it connected with?

Does it help in any way to tune the frequency?

I also wonder if we could adjust the frequency dynamically. I.e. start
with optimistic value and lower it until finding the sweet spot.

/Jarkko

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH] Fix Atmel TPM crash caused by too frequent queries
  2020-09-30 15:37                   ` Jarkko Sakkinen
@ 2020-09-30 20:48                     ` James Bottomley
  2020-09-30 21:09                       ` Jarkko Sakkinen
  0 siblings, 1 reply; 91+ messages in thread
From: James Bottomley @ 2020-09-30 20:48 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: Hao Wu, peterhuewe, jgg, arnd, gregkh, Hamza Attak, nayna,
	why2jjj.linux, zohar, linux-integrity, Paul Menzel, Ken Goldman,
	Seungyeop Han, Shrihari Kalkar, Anish Jhaveri

On Wed, 2020-09-30 at 18:37 +0300, Jarkko Sakkinen wrote:
> On Wed, Sep 30, 2020 at 07:54:58AM -0700, James Bottomley wrote:
> > On Wed, 2020-09-30 at 05:16 +0300, Jarkko Sakkinen wrote:
> > > On Mon, Sep 28, 2020 at 03:11:39PM -0700, James Bottomley wrote:
> > > > On Sun, 2020-09-27 at 22:59 -0700, Hao Wu wrote:
> > > > [...]
> > > > > > However, there is another possibility: it's something to do
> > > > > > with the byte read; I notice you don't require the same
> > > > > > slowdown for the burst count read, which actually reads the
> > > > > > status register and burst count as a read32.  If that
> > > > > > really is the case, for the atmel would substituting a
> > > > > > read32 and just throwing the upper bytes away in
> > > > > > tpm_tis_status() allow us to keep the current timings?  I
> > > > > > can actually try doing this and see if it fixes my nuvoton.
> > > > > 
> > > > > If would be helpful if you can find the solution without
> > > > > reducing performance. I think it is a separate problem to
> > > > > address though. Maybe not worth to mix them in the same fix.
> > > > 
> > > > Well, if it works, no other fix is needed.
> > > > 
> > > > This is what I'm currently trying out on my nuvoton with the
> > > > timings reverted to being those in the vanilla kernel.  So far
> > > > it hasn't crashed, but I haven't run it for long enough to be
> > > > sure yet.
> > > > 
> > > > James
> > > 
> > > OK, so the bus does not like one byte reads but prefers full (32-
> > > bit) word reads? I.e. what's the context?
> > 
> > It's not supported by anything in the spec just empirical
> > observation.  However, the spec says the status register is 24
> > bits: the upper 16 being the burst count.  When we read the whole
> > status register, including the burst count, we do a read32. I
> > observed that the elongated timing was only added for the read8
> > code not the read32 which supports the theory that the former
> > causes the Atmel to crash but the latter doesn't.  Of course it's
> > always possible that probabilistically the Atmel is going to crash
> > on the burst count read, but that's exercised far less than the
> > status only read.
> 
> This paragraph is good enough explanation for me. Can you include it
> to the final commit as soon as we hear how your fix works for Hao?

Sure.  I'm afraid I have to report that it didn't work for me.  My
Nuvoton is definitely annoyed by the frequency of the prodding rather
than the register width.

James



^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH] Fix Atmel TPM crash caused by too frequent queries
  2020-09-30 14:54                 ` James Bottomley
@ 2020-09-30 15:37                   ` Jarkko Sakkinen
  2020-09-30 20:48                     ` James Bottomley
  0 siblings, 1 reply; 91+ messages in thread
From: Jarkko Sakkinen @ 2020-09-30 15:37 UTC (permalink / raw)
  To: James Bottomley
  Cc: Hao Wu, peterhuewe, jgg, arnd, gregkh, Hamza Attak, nayna,
	why2jjj.linux, zohar, linux-integrity, Paul Menzel, Ken Goldman,
	Seungyeop Han, Shrihari Kalkar, Anish Jhaveri

On Wed, Sep 30, 2020 at 07:54:58AM -0700, James Bottomley wrote:
> On Wed, 2020-09-30 at 05:16 +0300, Jarkko Sakkinen wrote:
> > On Mon, Sep 28, 2020 at 03:11:39PM -0700, James Bottomley wrote:
> > > On Sun, 2020-09-27 at 22:59 -0700, Hao Wu wrote:
> > > [...]
> > > > > However, there is another possibility: it's something to do
> > > > > with the byte read; I notice you don't require the same
> > > > > slowdown for the burst count read, which actually reads the
> > > > > status register and burst count as a read32.  If that really is
> > > > > the case, for the atmel would substituting a read32 and just
> > > > > throwing the upper bytes away in tpm_tis_status() allow us to
> > > > > keep the current timings?  I can actually try doing this and
> > > > > see if it fixes my nuvoton.
> > > > 
> > > > If would be helpful if you can find the solution without reducing
> > > > performance. I think it is a separate problem to address though.
> > > > Maybe not worth to mix them in the same fix.
> > > 
> > > Well, if it works, no other fix is needed.
> > > 
> > > This is what I'm currently trying out on my nuvoton with the
> > > timings reverted to being those in the vanilla kernel.  So far it
> > > hasn't crashed, but I haven't run it for long enough to be sure
> > > yet.
> > > 
> > > James
> > 
> > OK, so the bus does not like one byte reads but prefers full (32-bit)
> > word reads? I.e. what's the context?
> 
> It's not supported by anything in the spec just empirical observation. 
> However, the spec says the status register is 24 bits: the upper 16
> being the burst count.  When we read the whole status register,
> including the burst count, we do a read32. I observed that the
> elongated timing was only added for the read8 code not the read32 which
> supports the theory that the former causes the Atmel to crash but the
> latter doesn't.  Of course it's always possible that probabilistically
> the Atmel is going to crash on the burst count read, but that's
> exercised far less than the status only read.

This paragraph is good enough explanation for me. Can you include it
to the final commit as soon as we hear how your fix works for Hao?

/Jarkko

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH] Fix Atmel TPM crash caused by too frequent queries
  2020-09-30  2:16               ` Jarkko Sakkinen
@ 2020-09-30 14:54                 ` James Bottomley
  2020-09-30 15:37                   ` Jarkko Sakkinen
  0 siblings, 1 reply; 91+ messages in thread
From: James Bottomley @ 2020-09-30 14:54 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: Hao Wu, peterhuewe, jgg, arnd, gregkh, Hamza Attak, nayna,
	why2jjj.linux, zohar, linux-integrity, Paul Menzel, Ken Goldman,
	Seungyeop Han, Shrihari Kalkar, Anish Jhaveri

On Wed, 2020-09-30 at 05:16 +0300, Jarkko Sakkinen wrote:
> On Mon, Sep 28, 2020 at 03:11:39PM -0700, James Bottomley wrote:
> > On Sun, 2020-09-27 at 22:59 -0700, Hao Wu wrote:
> > [...]
> > > > However, there is another possibility: it's something to do
> > > > with the byte read; I notice you don't require the same
> > > > slowdown for the burst count read, which actually reads the
> > > > status register and burst count as a read32.  If that really is
> > > > the case, for the atmel would substituting a read32 and just
> > > > throwing the upper bytes away in tpm_tis_status() allow us to
> > > > keep the current timings?  I can actually try doing this and
> > > > see if it fixes my nuvoton.
> > > 
> > > If would be helpful if you can find the solution without reducing
> > > performance. I think it is a separate problem to address though.
> > > Maybe not worth to mix them in the same fix.
> > 
> > Well, if it works, no other fix is needed.
> > 
> > This is what I'm currently trying out on my nuvoton with the
> > timings reverted to being those in the vanilla kernel.  So far it
> > hasn't crashed, but I haven't run it for long enough to be sure
> > yet.
> > 
> > James
> 
> OK, so the bus does not like one byte reads but prefers full (32-bit)
> word reads? I.e. what's the context?

It's not supported by anything in the spec just empirical observation. 
However, the spec says the status register is 24 bits: the upper 16
being the burst count.  When we read the whole status register,
including the burst count, we do a read32. I observed that the
elongated timing was only added for the read8 code not the read32 which
supports the theory that the former causes the Atmel to crash but the
latter doesn't.  Of course it's always possible that probabilistically
the Atmel is going to crash on the burst count read, but that's
exercised far less than the status only read.

James
 


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH] Fix Atmel TPM crash caused by too frequent queries
  2020-09-30  2:11                   ` Jarkko Sakkinen
@ 2020-09-30  3:41                     ` Hao Wu
       [not found]                       ` <EA1EE8F8-F054-4E1B-B830-231398D33CB8@rubrik.com>
  0 siblings, 1 reply; 91+ messages in thread
From: Hao Wu @ 2020-09-30  3:41 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: James Bottomley, Nayna Jain, peterhuewe, jgg, arnd, gregkh,
	Hamza Attak, why2jjj.linux, zohar, linux-integrity, Paul Menzel,
	Ken Goldman, Seungyeop Han, Shrihari Kalkar, Anish Jhaveri


> On Sep 29, 2020, at 7:11 PM, Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> wrote:
> 
> On Mon, Sep 28, 2020 at 01:27:14PM -0700, Hao Wu wrote:
>> 
>> 
>>> On Sep 28, 2020, at 12:47 PM, Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> wrote:
>>> 
>>> On Mon, Sep 28, 2020 at 10:49:56AM -0700, Hao Wu wrote:
>>>> Hi Jarkko,
>>>> 
>>>> https://github.com/torvalds/linux/commit/9f3fc7bcddcb51234e23494531f93ab60475e1c3 
>>>> Is the one introducing the issue since 4.14. Then the other three commits
>>>> changed the relevant code a bit. Probably you can check the timestamp / release version
>>>> on each commit to understand the relationship.
>>>> 
>>>> I think the original patch commit message can help you understand the root cause.
>>>> Attaching the commit here for your convenience.
>>>> 
>>>> Thanks
>>>> Hao
>>> 
>>> Please, again, when you respond quote properly instead of putting your
>>> response on top. Thank you.
>>> 
>>> Yes, I know the issue and it is already documented also in the James'
>>> earlier patch that did a similar change. I.e. for some reason some TPM's
>>> (or the bus itself) do not like poking it too often.
>> Yes, probably. Although the issue James’s patch fixes has the same error code,
>> it is about a different issue which is similar.
> 
> OK, great.
> 
>>> So: what if you revert on using msleep(TPM_TIMEOUT) in
>>> wait_for_tpm_stat(), i.e. revert to the behaviour before the
>>> aformentioned commit?
>> I believe that should resolve the issue as well
> 
> I'd return to the old code that works instead of doing something new
> along the lines. James?
> 
I would not use msleep back which is actually wrong way to do. 
We don’t know the actual time it sleeps on different system in the future.
Currently, my measurement over msleep(TPM_TIMEOUT) , i.e. msleep(5) 
sleeps 15ms. Maybe we should use tpm_msleep to precisely do the sleep.

I will test out James’ patch and your proposal this week and get you back anyway. 

> Anyway, thanks a lot for coming with this. I think we are making at
> least some progress sorting this out.
> 
> Also want to underline that my comments about quoting emails did not
> have anything to do that I would not appreciate this feedback. It is
> just a "protocol thing".
No worries. I am not familiar with the rules here. Thank you for corrections.

> 
>> Thanks
>> Hao
> 
> /Jarkko

Hao


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH] Fix Atmel TPM crash caused by too frequent queries
  2020-09-28 22:11             ` James Bottomley
  2020-09-29  4:46               ` Hao Wu
@ 2020-09-30  2:16               ` Jarkko Sakkinen
  2020-09-30 14:54                 ` James Bottomley
  1 sibling, 1 reply; 91+ messages in thread
From: Jarkko Sakkinen @ 2020-09-30  2:16 UTC (permalink / raw)
  To: James Bottomley
  Cc: Hao Wu, peterhuewe, jgg, arnd, gregkh, Hamza Attak, nayna,
	why2jjj.linux, zohar, linux-integrity, Paul Menzel, Ken Goldman,
	Seungyeop Han, Shrihari Kalkar, Anish Jhaveri

On Mon, Sep 28, 2020 at 03:11:39PM -0700, James Bottomley wrote:
> On Sun, 2020-09-27 at 22:59 -0700, Hao Wu wrote:
> [...]
> > > However, there is another possibility: it's something to do with
> > > the byte read; I notice you don't require the same slowdown for the
> > > burst count read, which actually reads the status register and
> > > burst count as a read32.  If that really is the case, for the atmel
> > > would substituting a read32 and just throwing the upper bytes away
> > > in tpm_tis_status() allow us to keep the current timings?  I can
> > > actually try doing this and see if it fixes my nuvoton.
> > 
> > If would be helpful if you can find the solution without reducing
> > performance. I think it is a separate problem to address though.
> > Maybe not worth to mix them in the same fix.
> 
> Well, if it works, no other fix is needed.
> 
> This is what I'm currently trying out on my nuvoton with the timings
> reverted to being those in the vanilla kernel.  So far it hasn't
> crashed, but I haven't run it for long enough to be sure yet.
> 
> James

OK, so the bus does not like one byte reads but prefers full (32-bit)
word reads? I.e. what's the context?

> ---
> 
> diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
> index 6b884badabe7..c4dbac8edc9b 100644
> --- a/drivers/char/tpm/tpm_tis_core.c
> +++ b/drivers/char/tpm/tpm_tis_core.c
> @@ -233,9 +233,9 @@ static u8 tpm_tis_status(struct tpm_chip *chip)
>  {
>  	struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev);
>  	int rc;
> -	u8 status;
> +	u32 status;
>  
> -	rc = tpm_tis_read8(priv, TPM_STS(priv->locality), &status);
> +	rc = tpm_tis_read32(priv, TPM_STS(priv->locality), &status);
>  	if (rc < 0)
>  		return 0;
>  
> 

/Jarkko

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH] Fix Atmel TPM crash caused by too frequent queries
  2020-09-28 20:27                 ` Hao Wu
@ 2020-09-30  2:11                   ` Jarkko Sakkinen
  2020-09-30  3:41                     ` Hao Wu
  0 siblings, 1 reply; 91+ messages in thread
From: Jarkko Sakkinen @ 2020-09-30  2:11 UTC (permalink / raw)
  To: Hao Wu
  Cc: James Bottomley, Nayna Jain, peterhuewe, jgg, arnd, gregkh,
	Hamza Attak, why2jjj.linux, zohar, linux-integrity, Paul Menzel,
	Ken Goldman, Seungyeop Han, Shrihari Kalkar, Anish Jhaveri

On Mon, Sep 28, 2020 at 01:27:14PM -0700, Hao Wu wrote:
> 
> 
> > On Sep 28, 2020, at 12:47 PM, Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> wrote:
> > 
> > On Mon, Sep 28, 2020 at 10:49:56AM -0700, Hao Wu wrote:
> >> Hi Jarkko,
> >> 
> >> https://github.com/torvalds/linux/commit/9f3fc7bcddcb51234e23494531f93ab60475e1c3 
> >> Is the one introducing the issue since 4.14. Then the other three commits
> >> changed the relevant code a bit. Probably you can check the timestamp / release version
> >> on each commit to understand the relationship.
> >> 
> >> I think the original patch commit message can help you understand the root cause.
> >> Attaching the commit here for your convenience.
> >> 
> >> Thanks
> >> Hao
> > 
> > Please, again, when you respond quote properly instead of putting your
> > response on top. Thank you.
> > 
> > Yes, I know the issue and it is already documented also in the James'
> > earlier patch that did a similar change. I.e. for some reason some TPM's
> > (or the bus itself) do not like poking it too often.
> Yes, probably. Although the issue James’s patch fixes has the same error code,
> it is about a different issue which is similar.

OK, great.

> > So: what if you revert on using msleep(TPM_TIMEOUT) in
> > wait_for_tpm_stat(), i.e. revert to the behaviour before the
> > aformentioned commit?
> I believe that should resolve the issue as well

I'd return to the old code that works instead of doing something new
along the lines. James?

Anyway, thanks a lot for coming with this. I think we are making at
least some progress sorting this out.

Also want to underline that my comments about quoting emails did not
have anything to do that I would not appreciate this feedback. It is
just a "protocol thing".

> Thanks
> Hao

/Jarkko

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH] Fix Atmel TPM crash caused by too frequent queries
  2020-09-28 22:11             ` James Bottomley
@ 2020-09-29  4:46               ` Hao Wu
  2020-09-30  2:16               ` Jarkko Sakkinen
  1 sibling, 0 replies; 91+ messages in thread
From: Hao Wu @ 2020-09-29  4:46 UTC (permalink / raw)
  To: James Bottomley
  Cc: peterhuewe, jarkko.sakkinen, jgg, arnd, gregkh, Hamza Attak,
	nayna, why2jjj.linux, zohar, linux-integrity, Paul Menzel,
	Ken Goldman, Seungyeop Han, Shrihari Kalkar, Anish Jhaveri



> On Sep 28, 2020, at 3:11 PM, James Bottomley <James.Bottomley@hansenpartnership.com> wrote:
> 
> On Sun, 2020-09-27 at 22:59 -0700, Hao Wu wrote:
> [...]
>>> However, there is another possibility: it's something to do with
>>> the byte read; I notice you don't require the same slowdown for the
>>> burst count read, which actually reads the status register and
>>> burst count as a read32.  If that really is the case, for the atmel
>>> would substituting a read32 and just throwing the upper bytes away
>>> in tpm_tis_status() allow us to keep the current timings?  I can
>>> actually try doing this and see if it fixes my nuvoton.
>> 
>> If would be helpful if you can find the solution without reducing
>> performance. I think it is a separate problem to address though.
>> Maybe not worth to mix them in the same fix.
> 
> Well, if it works, no other fix is needed.
> 
> This is what I'm currently trying out on my nuvoton with the timings
> reverted to being those in the vanilla kernel.  So far it hasn't
> crashed, but I haven't run it for long enough to be sure yet.
> 
> James
> 
> ---
> 
> diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
> index 6b884badabe7..c4dbac8edc9b 100644
> --- a/drivers/char/tpm/tpm_tis_core.c
> +++ b/drivers/char/tpm/tpm_tis_core.c
> @@ -233,9 +233,9 @@ static u8 tpm_tis_status(struct tpm_chip *chip)
> {
> 	struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev);
> 	int rc;
> -	u8 status;
> +	u32 status;
> 
> -	rc = tpm_tis_read8(priv, TPM_STS(priv->locality), &status);
> +	rc = tpm_tis_read32(priv, TPM_STS(priv->locality), &status);
> 	if (rc < 0)
> 		return 0;
> 
> 

Thanks James for the quick patch. 
I will apply it this week any see whether it helps or not.

Hao

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH] Fix Atmel TPM crash caused by too frequent queries
  2020-09-28  5:59           ` Hao Wu
@ 2020-09-28 22:11             ` James Bottomley
  2020-09-29  4:46               ` Hao Wu
  2020-09-30  2:16               ` Jarkko Sakkinen
  0 siblings, 2 replies; 91+ messages in thread
From: James Bottomley @ 2020-09-28 22:11 UTC (permalink / raw)
  To: Hao Wu
  Cc: peterhuewe, jarkko.sakkinen, jgg, arnd, gregkh, Hamza Attak,
	nayna, why2jjj.linux, zohar, linux-integrity, Paul Menzel,
	Ken Goldman, Seungyeop Han, Shrihari Kalkar, Anish Jhaveri

On Sun, 2020-09-27 at 22:59 -0700, Hao Wu wrote:
[...]
> > However, there is another possibility: it's something to do with
> > the byte read; I notice you don't require the same slowdown for the
> > burst count read, which actually reads the status register and
> > burst count as a read32.  If that really is the case, for the atmel
> > would substituting a read32 and just throwing the upper bytes away
> > in tpm_tis_status() allow us to keep the current timings?  I can
> > actually try doing this and see if it fixes my nuvoton.
> 
> If would be helpful if you can find the solution without reducing
> performance. I think it is a separate problem to address though.
> Maybe not worth to mix them in the same fix.

Well, if it works, no other fix is needed.

This is what I'm currently trying out on my nuvoton with the timings
reverted to being those in the vanilla kernel.  So far it hasn't
crashed, but I haven't run it for long enough to be sure yet.

James

---

diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
index 6b884badabe7..c4dbac8edc9b 100644
--- a/drivers/char/tpm/tpm_tis_core.c
+++ b/drivers/char/tpm/tpm_tis_core.c
@@ -233,9 +233,9 @@ static u8 tpm_tis_status(struct tpm_chip *chip)
 {
 	struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev);
 	int rc;
-	u8 status;
+	u32 status;
 
-	rc = tpm_tis_read8(priv, TPM_STS(priv->locality), &status);
+	rc = tpm_tis_read32(priv, TPM_STS(priv->locality), &status);
 	if (rc < 0)
 		return 0;
 


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH] Fix Atmel TPM crash caused by too frequent queries
  2020-09-28 19:47               ` Jarkko Sakkinen
@ 2020-09-28 20:27                 ` Hao Wu
  2020-09-30  2:11                   ` Jarkko Sakkinen
  0 siblings, 1 reply; 91+ messages in thread
From: Hao Wu @ 2020-09-28 20:27 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: James Bottomley, Nayna Jain, peterhuewe, jgg, arnd, gregkh,
	Hamza Attak, why2jjj.linux, zohar, linux-integrity, Paul Menzel,
	Ken Goldman, Seungyeop Han, Shrihari Kalkar, Anish Jhaveri



> On Sep 28, 2020, at 12:47 PM, Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> wrote:
> 
> On Mon, Sep 28, 2020 at 10:49:56AM -0700, Hao Wu wrote:
>> Hi Jarkko,
>> 
>> https://github.com/torvalds/linux/commit/9f3fc7bcddcb51234e23494531f93ab60475e1c3 
>> Is the one introducing the issue since 4.14. Then the other three commits
>> changed the relevant code a bit. Probably you can check the timestamp / release version
>> on each commit to understand the relationship.
>> 
>> I think the original patch commit message can help you understand the root cause.
>> Attaching the commit here for your convenience.
>> 
>> Thanks
>> Hao
> 
> Please, again, when you respond quote properly instead of putting your
> response on top. Thank you.
> 
> Yes, I know the issue and it is already documented also in the James'
> earlier patch that did a similar change. I.e. for some reason some TPM's
> (or the bus itself) do not like poking it too often.
Yes, probably. Although the issue James’s patch fixes has the same error code,
it is about a different issue which is similar.

> So: what if you revert on using msleep(TPM_TIMEOUT) in
> wait_for_tpm_stat(), i.e. revert to the behaviour before the
> aformentioned commit?
I believe that should resolve the issue as well

Thanks
Hao






^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH] Fix Atmel TPM crash caused by too frequent queries
  2020-09-28 17:49             ` Hao Wu
@ 2020-09-28 19:47               ` Jarkko Sakkinen
  2020-09-28 20:27                 ` Hao Wu
  0 siblings, 1 reply; 91+ messages in thread
From: Jarkko Sakkinen @ 2020-09-28 19:47 UTC (permalink / raw)
  To: Hao Wu
  Cc: James Bottomley, Nayna Jain, peterhuewe, jgg, arnd, gregkh,
	Hamza Attak, why2jjj.linux, zohar, linux-integrity, Paul Menzel,
	Ken Goldman, Seungyeop Han, Shrihari Kalkar, Anish Jhaveri

On Mon, Sep 28, 2020 at 10:49:56AM -0700, Hao Wu wrote:
> Hi Jarkko,
> 
> https://github.com/torvalds/linux/commit/9f3fc7bcddcb51234e23494531f93ab60475e1c3 
> Is the one introducing the issue since 4.14. Then the other three commits
> changed the relevant code a bit. Probably you can check the timestamp / release version
> on each commit to understand the relationship.
> 
> I think the original patch commit message can help you understand the root cause.
> Attaching the commit here for your convenience.
> 
> Thanks
> Hao

Please, again, when you respond quote properly instead of putting your
response on top. Thank you.

Yes, I know the issue and it is already documented also in the James'
earlier patch that did a similar change. I.e. for some reason some TPM's
(or the bus itself) do not like poking it too often.

So: what if you revert on using msleep(TPM_TIMEOUT) in
wait_for_tpm_stat(), i.e. revert to the behaviour before the
aformentioned commit?

/Jarkko

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH] Fix Atmel TPM crash caused by too frequent queries
  2020-09-28 14:16           ` Jarkko Sakkinen
@ 2020-09-28 17:49             ` Hao Wu
  2020-09-28 19:47               ` Jarkko Sakkinen
  0 siblings, 1 reply; 91+ messages in thread
From: Hao Wu @ 2020-09-28 17:49 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: James Bottomley, Nayna Jain, peterhuewe, jgg, arnd, gregkh,
	Hamza Attak, why2jjj.linux, zohar, linux-integrity, Paul Menzel,
	Ken Goldman, Seungyeop Han, Shrihari Kalkar, Anish Jhaveri

Hi Jarkko,

https://github.com/torvalds/linux/commit/9f3fc7bcddcb51234e23494531f93ab60475e1c3 
Is the one introducing the issue since 4.14. Then the other three commits
changed the relevant code a bit. Probably you can check the timestamp / release version
on each commit to understand the relationship.

I think the original patch commit message can help you understand the root cause.
Attaching the commit here for your convenience.

Thanks
Hao

-----

Since kernel 4.14, we fixed the TPM sleep logic
from msleep to usleep_range, so that the TPM
sleeps exactly with TPM_TIMEOUT (=5ms) afterward.
Before that fix, msleep(5) actually sleeps for
around 15ms.
The fix is https://github.com/torvalds/linux/commit/9f3fc7bcddcb51234e23494531f93ab60475e1c3

That fix uncovered that the TPM_TIMEOUT was not properly
set previously. We recently found the TPM driver in kernel 4.14+
(including 5.9-rc4) crashes Atmel TPM chips with
too frequent TPM queries.

The TPM crash signature is
```
$ tpm_sealdata -z
Tspi_Key_LoadKey failed: 0x00001087 - layer=tddl, code=0087 (135), I/O error

$ sudo dmesg | grep tpm0
[59154.665549] tpm tpm0: tpm_try_transmit: send(): error -62
[59154.809532] tpm tpm0: tpm_try_transmit: send(): error -62
```

From the error code "-62", it looks similar to another bug
https://patchwork.kernel.org/patch/10520247/
where the "TPM_TIMEOUT_USECS_MAX" and "TPM_TIMEOUT_USEC_MIN"
is too small, which causes TPM get queried too frequently,
and thus crashes.

We finally found the root cause is related to
the sleep timeout used in `wait_for_tpm_stat`

In the 4.16, commit
https://github.com/torvalds/linux/commit/cf151a9a44d52a63332e8e926234574fe5a5d784
uses `TPM_POLL_SLEEP` instead of TPM_TIMEOUT for `wait_for_tpm_stat`
and set `TPM_POLL_SLEEP` (1ms).

Since 4.18 commits
https://github.com/torvalds/linux/commit/59f5a6b07f6434efac0057dc2f303a96b871811b
https://github.com/torvalds/linux/commit/424eaf910c329ab06ad03a527ef45dcf6a328f00
further reduced the timeout in wait_for_tpm_stat to less than 1ms.

Our patch here defines a `TPM_TIMEOUT_WAIT_STAT` (15ms) just for
`wait_for_tpm_stat the` to fix the crash in Atmel chips,
but not introduce unnecessary performance regression
in other workflows.
15ms is the actual timeout TPM chips use before the 4.14 fix
from msleep to usleep_range. Thus fixed the crash.

Exploring smaller `TPM_TIMEOUT_WAIT_STAT` should be a separate
thing to revisit later. This patch meant to fix the regression
introduced since 4.14

Test Plan:
- Run fixed kernel on system with Atmel TPM chips
 and ensure crash does not happen
- Run fixed kernel on system with other TPM chips
 (IFX / WEC / STM) ensure not breakages from tpm-tool

Signed-off-by: Hao Wu <hao.wu@rubrik.com>
---
drivers/char/tpm/tpm.h          | 1 +
drivers/char/tpm/tpm_tis_core.c | 3 +--
2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/char/tpm/tpm.h b/drivers/char/tpm/tpm.h
index 947d1db0a5cc..899097ae9756 100644
--- a/drivers/char/tpm/tpm.h
+++ b/drivers/char/tpm/tpm.h
@@ -41,6 +41,7 @@ enum tpm_timeout {
	TPM_TIMEOUT_RETRY = 100, /* msecs */
	TPM_TIMEOUT_RANGE_US = 300,	/* usecs */
	TPM_TIMEOUT_POLL = 1,	/* msecs */
+	TPM_TIMEOUT_WAIT_STAT = 15,      /* msecs */
	TPM_TIMEOUT_USECS_MIN = 100,      /* usecs */
	TPM_TIMEOUT_USECS_MAX = 500      /* usecs */
};
diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
index 65ab1b027949..8aa5eef10c28 100644
--- a/drivers/char/tpm/tpm_tis_core.c
+++ b/drivers/char/tpm/tpm_tis_core.c
@@ -80,8 +80,7 @@ static int wait_for_tpm_stat(struct tpm_chip *chip, u8 mask,
		}
	} else {
		do {
-			usleep_range(TPM_TIMEOUT_USECS_MIN,
-				     TPM_TIMEOUT_USECS_MAX);
+			tpm_msleep(TPM_TIMEOUT_WAIT_STAT);
			status = chip->ops->status(chip);
			if ((status & mask) == mask)
				return 0;
-- 
2.17.1



> On Sep 28, 2020, at 7:16 AM, Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> wrote:
> 
> On Sun, Sep 27, 2020 at 11:03:47PM -0700, Hao Wu wrote:
>> Hi Jarkko,
>> 
>> Just to be clear it is not caused by that single commit, but a few accumulated commits
>> https://github.com/torvalds/linux/commit/9f3fc7bcddcb51234e23494531f93ab60475e1c3
>> https://github.com/torvalds/linux/commit/cf151a9a44d52a63332e8e926234574fe5a5d784
>> https://github.com/torvalds/linux/commit/59f5a6b07f6434efac0057dc2f303a96b871811b
>> https://github.com/torvalds/linux/commit/424eaf910c329ab06ad03a527ef45dcf6a328f00
>> 
>> The easy way is probably just apply the patch I provided, and then revisit the value
>> for TPM_TIMEOUT_WAIT_STAT  
> 
> When you response, please quote properly, and do not top post.  The
> discussion is impossible to follow this way.
> 
> I'm not sure if I buy that. Which one is the first failing commit?
> 
> /Jarkko


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH] Fix Atmel TPM crash caused by too frequent queries
  2020-09-28  6:03         ` Hao Wu
@ 2020-09-28 14:16           ` Jarkko Sakkinen
  2020-09-28 17:49             ` Hao Wu
  0 siblings, 1 reply; 91+ messages in thread
From: Jarkko Sakkinen @ 2020-09-28 14:16 UTC (permalink / raw)
  To: Hao Wu
  Cc: James Bottomley, Nayna Jain, peterhuewe, jgg, arnd, gregkh,
	Hamza Attak, why2jjj.linux, zohar, linux-integrity, Paul Menzel,
	Ken Goldman, Seungyeop Han, Shrihari Kalkar, Anish Jhaveri

On Sun, Sep 27, 2020 at 11:03:47PM -0700, Hao Wu wrote:
> Hi Jarkko,
> 
> Just to be clear it is not caused by that single commit, but a few accumulated commits
> https://github.com/torvalds/linux/commit/9f3fc7bcddcb51234e23494531f93ab60475e1c3
> https://github.com/torvalds/linux/commit/cf151a9a44d52a63332e8e926234574fe5a5d784
> https://github.com/torvalds/linux/commit/59f5a6b07f6434efac0057dc2f303a96b871811b
> https://github.com/torvalds/linux/commit/424eaf910c329ab06ad03a527ef45dcf6a328f00
> 
> The easy way is probably just apply the patch I provided, and then revisit the value
> for TPM_TIMEOUT_WAIT_STAT  

When you response, please quote properly, and do not top post.  The
discussion is impossible to follow this way.

I'm not sure if I buy that. Which one is the first failing commit?

/Jarkko

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH] Fix Atmel TPM crash caused by too frequent queries
  2020-09-28  1:08       ` Jarkko Sakkinen
@ 2020-09-28  6:03         ` Hao Wu
  2020-09-28 14:16           ` Jarkko Sakkinen
  0 siblings, 1 reply; 91+ messages in thread
From: Hao Wu @ 2020-09-28  6:03 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: James Bottomley, Nayna Jain, peterhuewe, jgg, arnd, gregkh,
	Hamza Attak, why2jjj.linux, zohar, linux-integrity, Paul Menzel,
	Ken Goldman, Seungyeop Han, Shrihari Kalkar, Anish Jhaveri

Hi Jarkko,

Just to be clear it is not caused by that single commit, but a few accumulated commits
https://github.com/torvalds/linux/commit/9f3fc7bcddcb51234e23494531f93ab60475e1c3
https://github.com/torvalds/linux/commit/cf151a9a44d52a63332e8e926234574fe5a5d784
https://github.com/torvalds/linux/commit/59f5a6b07f6434efac0057dc2f303a96b871811b
https://github.com/torvalds/linux/commit/424eaf910c329ab06ad03a527ef45dcf6a328f00

The easy way is probably just apply the patch I provided, and then revisit the value
for TPM_TIMEOUT_WAIT_STAT  

Thanks
Hao

> 
> I think I should revert 424eaf910c329, up until more legit values are found.
> 
> /Jarkko


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH] Fix Atmel TPM crash caused by too frequent queries
  2020-09-28  1:22         ` James Bottomley
@ 2020-09-28  5:59           ` Hao Wu
  2020-09-28 22:11             ` James Bottomley
  0 siblings, 1 reply; 91+ messages in thread
From: Hao Wu @ 2020-09-28  5:59 UTC (permalink / raw)
  To: James Bottomley
  Cc: peterhuewe, jarkko.sakkinen, jgg, arnd, gregkh, Hamza Attak,
	nayna, why2jjj.linux, zohar, linux-integrity, Paul Menzel,
	Ken Goldman, Seungyeop Han, Shrihari Kalkar, Anish Jhaveri

Hi James,

>  Upping that to 15ms introduces a 100x delay in
> our status wait for the TPM to become ready, potentially slowing down
> all TIS TPM operations by 100x, which will hit us most with the PCR
> writes we do for IMA logging ... that seems like a bad bargain for a
> single TPM family manufacturer.
1. It is unlike to be 100x delay 
According to 
https://github.com/torvalds/linux/commit/59f5a6b07f6434efac0057dc2f303a96b871811b
https://github.com/torvalds/linux/commit/424eaf910c329ab06ad03a527ef45dcf6a328f00
It only optimize from 14sec to 7sec. Which is only a 2x speed up by using sleep time from 5ms to >1ms.
Here we change it back to 15 ms is very unlikely to have 100x delay. 
The optimization does not worth to have a breakage on chip from one manufacturer.

2. In my opinion, the kernel should support all manufacturers which were supported before. 
Not supporting any of them would lead to kernel divergence, because those chip users have to
Use it anyway. Maybe we can see the maintainers’ opinion on this.

3. I am kind of opposing to coming up smaller values without doing comprehensive
qualification on all supported manufacturers. Stable is probably more important for such software.
Looking back to these commits that introduced the breakages, only one or two
chips are tested. If that is a common case, probably we should refactor
the TPM driver to better support per-manufacturer configuration?    

> However, there is another possibility: it's something to do with the
> byte read; I notice you don't require the same slowdown for the burst
> count read, which actually reads the status register and burst count as
> a read32.  If that really is the case, for the atmel would substituting
> a read32 and just throwing the upper bytes away in tpm_tis_status()
> allow us to keep the current timings?  I can actually try doing this
> and see if it fixes my nuvoton.

If would be helpful if you can find the solution without reducing performance.
I think it is a separate problem to address though. Maybe not worth to mix
them in the same fix.

Thanks
Hao

> On Sep 27, 2020, at 6:22 PM, James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
> 
> On Sun, 2020-09-27 at 17:11 -0700, Hao Wu wrote:
>> Hi James,
>> 
>> Maybe there is a misunderstanding. Here I am using tpm_msleep, not
>> msleep. tpm_msleep is using usleep underlaying. See
>> https://github.com/torvalds/linux/blob/master/drivers/char/tpm/tpm.h#L188
> 
> Right, I had missed that.
> 
>> The reasons I choose 15ms, is because before 
>> https://github.com/torvalds/linux/commit/9f3fc7bcddcb51234e23494531f93ab60475e1c3
>> (Where msleep is changed to tpm_msleep (which is essentially
>> usleep)), The actual sleep time is 15ms, thus here we change this
>> back to 15ms to fix regression.
> 
> Right now most TIS TPMs operate successfully with a sleep in there of
> the range 0.1-0.5ms.  Upping that to 15ms introduces a 100x delay in
> our status wait for the TPM to become ready, potentially slowing down
> all TIS TPM operations by 100x, which will hit us most with the PCR
> writes we do for IMA logging ... that seems like a bad bargain for a
> single TPM family manufacturer.
> 
> However, there is another possibility: it's something to do with the
> byte read; I notice you don't require the same slowdown for the burst
> count read, which actually reads the status register and burst count as
> a read32.  If that really is the case, for the atmel would substituting
> a read32 and just throwing the upper bytes away in tpm_tis_status()
> allow us to keep the current timings?  I can actually try doing this
> and see if it fixes my nuvoton.
> 
> James
> 
> 


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH] Fix Atmel TPM crash caused by too frequent queries
  2020-09-28  0:11       ` Hao Wu
  2020-09-28  0:15         ` Hao Wu
@ 2020-09-28  1:22         ` James Bottomley
  2020-09-28  5:59           ` Hao Wu
  1 sibling, 1 reply; 91+ messages in thread
From: James Bottomley @ 2020-09-28  1:22 UTC (permalink / raw)
  To: Hao Wu
  Cc: peterhuewe, jarkko.sakkinen, jgg, arnd, gregkh, Hamza Attak,
	nayna, why2jjj.linux, zohar, linux-integrity, Paul Menzel,
	Ken Goldman, Seungyeop Han, Shrihari Kalkar, Anish Jhaveri

On Sun, 2020-09-27 at 17:11 -0700, Hao Wu wrote:
> Hi James,
> 
> Maybe there is a misunderstanding. Here I am using tpm_msleep, not
> msleep. tpm_msleep is using usleep underlaying. See
> https://github.com/torvalds/linux/blob/master/drivers/char/tpm/tpm.h#L188

Right, I had missed that.

> The reasons I choose 15ms, is because before 
> https://github.com/torvalds/linux/commit/9f3fc7bcddcb51234e23494531f93ab60475e1c3
> (Where msleep is changed to tpm_msleep (which is essentially
> usleep)), The actual sleep time is 15ms, thus here we change this
> back to 15ms to fix regression.

Right now most TIS TPMs operate successfully with a sleep in there of
the range 0.1-0.5ms.  Upping that to 15ms introduces a 100x delay in
our status wait for the TPM to become ready, potentially slowing down
all TIS TPM operations by 100x, which will hit us most with the PCR
writes we do for IMA logging ... that seems like a bad bargain for a
single TPM family manufacturer.

However, there is another possibility: it's something to do with the
byte read; I notice you don't require the same slowdown for the burst
count read, which actually reads the status register and burst count as
a read32.  If that really is the case, for the atmel would substituting
a read32 and just throwing the upper bytes away in tpm_tis_status()
allow us to keep the current timings?  I can actually try doing this
and see if it fixes my nuvoton.

James



^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH] Fix Atmel TPM crash caused by too frequent queries
  2020-09-27 18:25     ` James Bottomley
  2020-09-28  0:11       ` Hao Wu
@ 2020-09-28  1:08       ` Jarkko Sakkinen
  2020-09-28  6:03         ` Hao Wu
  1 sibling, 1 reply; 91+ messages in thread
From: Jarkko Sakkinen @ 2020-09-28  1:08 UTC (permalink / raw)
  To: James Bottomley, Nayna Jain
  Cc: Hao Wu, peterhuewe, jgg, arnd, gregkh, Hamza Attak, nayna,
	why2jjj.linux, zohar, linux-integrity, Paul Menzel, Ken Goldman,
	Seungyeop Han, Shrihari Kalkar, Anish Jhaveri

On Sun, Sep 27, 2020 at 11:25:39AM -0700, James Bottomley wrote:
> On Sat, 2020-09-26 at 16:10 -0700, Hao Wu wrote:
> > Resending following email in plaintext.
> > 
> > ----
> > 
> > Hi James,
> > 
> > Thanks for following up.
> > 
> > We have actually tried change 
> > TPM_TIMEOUT_USECS_MIN / TPM_TIMEOUT_USECS_MAX 
> > according to https://patchwork.kernel.org/patch/10520247/
> > It does not solve the problem for ATMEL chip. The chips facing crash
> > is 
> > not experimental, but happens commonly in 
> > the production systems we and our customers are using.
> > It is widely found in Cisco 220 / 240 systems which are using
> > Ateml chips.
> 
> Well, I came up with the values in that patch by trial and error ....
> all I know is they work for my nuvoton. If they're not right for you,
> see if you can find what values actually do work for your TPM.  The
> difference between msleep and usleep_range is that the former can have
> an indefinitely long timeout and the latter has a range bounded one. 
> If you think msleep works for you, the chances are it doesn't and
> you're relying on the large upper bound to make the bug infrequent
> enough for you not to see it.  Playing with the values in usleep range
> will help you find what the actual timeout is and eliminate the problem
> for good.
> 
> James

I think I should revert 424eaf910c329, up until more legit values are found.

/Jarkko

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH] Fix Atmel TPM crash caused by too frequent queries
  2020-09-28  0:11       ` Hao Wu
@ 2020-09-28  0:15         ` Hao Wu
  2020-09-28  1:22         ` James Bottomley
  1 sibling, 0 replies; 91+ messages in thread
From: Hao Wu @ 2020-09-28  0:15 UTC (permalink / raw)
  To: Hao Wu
  Cc: James Bottomley, peterhuewe, jarkko.sakkinen, jgg, arnd, gregkh,
	Hamza Attak, nayna, why2jjj.linux, zohar, linux-integrity,
	Paul Menzel, Ken Goldman, Seungyeop Han, Shrihari Kalkar,
	Anish Jhaveri

I am attaching the original bug report to this thread for new reviewers to get better context

---

Hi TPM Driver Maintainers,

We are from Rubrik engineering team. We found a TPM driver update since kernel 4.14 causing atmel TPM chips crash. We have root caused it and have a patch on the kernel used by Rubrik products. Given the “bug” has impact on not just Rubrik products, but all atmel TPM chips, we are asking to fix the issue in the kernel upstream.

The commit that introduced the bug since 4.14 
https://github.com/torvalds/linux/commit/9f3fc7bcddcb51234e23494531f93ab60475e1c3

Effected platform / system:
- Cisco UCS C220 M5 with atmel TPM chip
	- Ubuntu 16.04
  	- Kernel 4.14 / 4.15 / 4.18 / 4.20 / 5.8 / 5.9-rc4
- Cisco UCS C240 M5 with atmel TPM chip
	- Ubuntu 16.04
  	- Kernel 4.14 / 4.15 / 4.18 / 4.20 / 5.8 / 5.9-rc4

```
# TPM chip used in the problematic platform
$ tpm_version
TPM 1.2 Version Info:
  Chip Version:        1.2.66.1
  Spec Level:          2
  Errata Revision:     3
  TPM Vendor ID:       ATML
  TPM Version:         01010000
  Manufacturer Info:   41544d4c
```

Not all Cisco server nodes are facing the crash, but there are a good amount of portion (around 50% from nodes in Rubrik) are persistently having the TPM crash issue.

Our other platforms using TPM chips from other vendors do not have issue. These platform are running the same software as the problematic platforms run. Those TPM non-effected vendors are
- IFX
- STM
- WEC

TPM crash signature:
```
# error when running tpm tool
$ tpm_sealdata -z
Tspi_Key_LoadKey failed: 0x00001087 - layer=tddl, code=0087 (135), I/O error

# tpm driver sends error
$ sudo dmesg | grep tpm0
[59154.665549] tpm tpm0: tpm_try_transmit: send(): error -62
[59154.809532] tpm tpm0: tpm_try_transmit: send(): error -62
```

Our Root Cause Analysis
From the error code “-62”, it looks similar to another bug https://patchwork.kernel.org/patch/10520247/
where the “TPM_TIMEOUT_USECS_MAX” and “TPM_TIMEOUT_USEC_MIN” is too small, which causes TPM get queried too frequently, and thus crashes.

The issue for atmel TPM chip crash is similar the commit https://github.com/torvalds/linux/commit/9f3fc7bcddcb51234e23494531f93ab60475e1c3 changed TPM sleep logic from using `msleep` to `tpm_msleep`, in which `usleep_range` is used.

As what https://github.com/torvalds/linux/commit/9f3fc7bcddcb51234e23494531f93ab60475e1c3 intended to do, using usleep_range can make the sleep period shorter, because msleep actually sleeps longer when the sleep perioid is within 20ms, and usleep_range can make it more precise.

According to our experiments,
- usleep_range makes the TPM sleep precisely for TPM_TIMEOUT (i.e. 5ms https://github.com/torvalds/linux/blob/v4.14/drivers/char/tpm/tpm.h#L52)
- msleep(TPM_TIMEOUT) actually sleeps around 15ms    

Thus the TPM get queried more frequently than before, which is likely the root cause of the atmel chip crash. We fix it by bumping up the TPM_TIMEOUT to 15ms.


Rubrik Patch (patch only for 4.14)
```
diff --git a/drivers/char/tpm/tpm.h b/drivers/char/tpm/tpm.h
index 72d3ce4..9b8f3f8 100644
--- a/drivers/char/tpm/tpm.h
+++ b/drivers/char/tpm/tpm.h
@@ -49,7 +49,15 @@ enum tpm_const {
 };

 enum tpm_timeout {
-       TPM_TIMEOUT = 5,        /* msecs */
+       TPM_TIMEOUT = 15,       /* msecs */
        TPM_TIMEOUT_RETRY = 100, /* msecs */
        TPM_TIMEOUT_RANGE_US = 300      /* usecs */
 };
```
With the patch, the atmel TPM chip crash is fixed.  

Proposal
We want the kernel upstream to adopt the fix or have a better fix for the atmel chip while not bring performance regression for other TPM chips. We understand that https://github.com/torvalds/linux/commit/9f3fc7bcddcb51234e23494531f93ab60475e1c3 was intended to shorten the TPM respond time, but it does not work well for atmel TPM chips. Probably we should override TPM_TIMEOUT value for atmel chips at least.

Thanks
Hao

> On Sep 27, 2020, at 5:11 PM, Hao Wu <hao.wu@rubrik.com> wrote:
> 
> Hi James,
> 
> Maybe there is a misunderstanding. Here I am using tpm_msleep, not msleep.
> tpm_msleep is using usleep underlaying. See
> https://github.com/torvalds/linux/blob/master/drivers/char/tpm/tpm.h#L188
> 
> The reasons I choose 15ms, is because before 
> https://github.com/torvalds/linux/commit/9f3fc7bcddcb51234e23494531f93ab60475e1c3
> (Where msleep is changed to tpm_msleep (which is essentially usleep)),
> The actual sleep time is 15ms, thus here we change this back to 15ms to fix
> regression.
> 
> Thanks
> Hao 
> 
>> On Sep 27, 2020, at 11:25 AM, James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
>> 
>> On Sat, 2020-09-26 at 16:10 -0700, Hao Wu wrote:
>>> Resending following email in plaintext.
>>> 
>>> ----
>>> 
>>> Hi James,
>>> 
>>> Thanks for following up.
>>> 
>>> We have actually tried change 
>>> TPM_TIMEOUT_USECS_MIN / TPM_TIMEOUT_USECS_MAX 
>>> according to https://patchwork.kernel.org/patch/10520247/
>>> It does not solve the problem for ATMEL chip. The chips facing crash
>>> is 
>>> not experimental, but happens commonly in 
>>> the production systems we and our customers are using.
>>> It is widely found in Cisco 220 / 240 systems which are using
>>> Ateml chips.
>> 
>> Well, I came up with the values in that patch by trial and error ....
>> all I know is they work for my nuvoton. If they're not right for you,
>> see if you can find what values actually do work for your TPM.  The
>> difference between msleep and usleep_range is that the former can have
>> an indefinitely long timeout and the latter has a range bounded one. 
>> If you think msleep works for you, the chances are it doesn't and
>> you're relying on the large upper bound to make the bug infrequent
>> enough for you not to see it.  Playing with the values in usleep range
>> will help you find what the actual timeout is and eliminate the problem
>> for good.
>> 
>> James
> 


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH] Fix Atmel TPM crash caused by too frequent queries
  2020-09-27 18:25     ` James Bottomley
@ 2020-09-28  0:11       ` Hao Wu
  2020-09-28  0:15         ` Hao Wu
  2020-09-28  1:22         ` James Bottomley
  2020-09-28  1:08       ` Jarkko Sakkinen
  1 sibling, 2 replies; 91+ messages in thread
From: Hao Wu @ 2020-09-28  0:11 UTC (permalink / raw)
  To: James Bottomley
  Cc: peterhuewe, jarkko.sakkinen, jgg, arnd, gregkh, Hamza Attak,
	nayna, why2jjj.linux, zohar, linux-integrity, Paul Menzel,
	Ken Goldman, Seungyeop Han, Shrihari Kalkar, Anish Jhaveri

Hi James,

Maybe there is a misunderstanding. Here I am using tpm_msleep, not msleep.
tpm_msleep is using usleep underlaying. See
https://github.com/torvalds/linux/blob/master/drivers/char/tpm/tpm.h#L188

The reasons I choose 15ms, is because before 
https://github.com/torvalds/linux/commit/9f3fc7bcddcb51234e23494531f93ab60475e1c3
(Where msleep is changed to tpm_msleep (which is essentially usleep)),
The actual sleep time is 15ms, thus here we change this back to 15ms to fix
regression.

Thanks
Hao 

> On Sep 27, 2020, at 11:25 AM, James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
> 
> On Sat, 2020-09-26 at 16:10 -0700, Hao Wu wrote:
>> Resending following email in plaintext.
>> 
>> ----
>> 
>> Hi James,
>> 
>> Thanks for following up.
>> 
>> We have actually tried change 
>> TPM_TIMEOUT_USECS_MIN / TPM_TIMEOUT_USECS_MAX 
>> according to https://patchwork.kernel.org/patch/10520247/
>> It does not solve the problem for ATMEL chip. The chips facing crash
>> is 
>> not experimental, but happens commonly in 
>> the production systems we and our customers are using.
>> It is widely found in Cisco 220 / 240 systems which are using
>> Ateml chips.
> 
> Well, I came up with the values in that patch by trial and error ....
> all I know is they work for my nuvoton. If they're not right for you,
> see if you can find what values actually do work for your TPM.  The
> difference between msleep and usleep_range is that the former can have
> an indefinitely long timeout and the latter has a range bounded one. 
> If you think msleep works for you, the chances are it doesn't and
> you're relying on the large upper bound to make the bug infrequent
> enough for you not to see it.  Playing with the values in usleep range
> will help you find what the actual timeout is and eliminate the problem
> for good.
> 
> James


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH] Fix Atmel TPM crash caused by too frequent queries
  2020-09-26 23:10   ` Hao Wu
@ 2020-09-27 18:25     ` James Bottomley
  2020-09-28  0:11       ` Hao Wu
  2020-09-28  1:08       ` Jarkko Sakkinen
  0 siblings, 2 replies; 91+ messages in thread
From: James Bottomley @ 2020-09-27 18:25 UTC (permalink / raw)
  To: Hao Wu
  Cc: peterhuewe, jarkko.sakkinen, jgg, arnd, gregkh, Hamza Attak,
	nayna, why2jjj.linux, zohar, linux-integrity, Paul Menzel,
	Ken Goldman, Seungyeop Han, Shrihari Kalkar, Anish Jhaveri

On Sat, 2020-09-26 at 16:10 -0700, Hao Wu wrote:
> Resending following email in plaintext.
> 
> ----
> 
> Hi James,
> 
> Thanks for following up.
> 
> We have actually tried change 
> TPM_TIMEOUT_USECS_MIN / TPM_TIMEOUT_USECS_MAX 
> according to https://patchwork.kernel.org/patch/10520247/
> It does not solve the problem for ATMEL chip. The chips facing crash
> is 
> not experimental, but happens commonly in 
> the production systems we and our customers are using.
> It is widely found in Cisco 220 / 240 systems which are using
> Ateml chips.

Well, I came up with the values in that patch by trial and error ....
all I know is they work for my nuvoton. If they're not right for you,
see if you can find what values actually do work for your TPM.  The
difference between msleep and usleep_range is that the former can have
an indefinitely long timeout and the latter has a range bounded one. 
If you think msleep works for you, the chances are it doesn't and
you're relying on the large upper bound to make the bug infrequent
enough for you not to see it.  Playing with the values in usleep range
will help you find what the actual timeout is and eliminate the problem
for good.

James



^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH] Fix Atmel TPM crash caused by too frequent queries
  2020-09-26 22:57 ` James Bottomley
@ 2020-09-26 23:10   ` Hao Wu
  2020-09-27 18:25     ` James Bottomley
  0 siblings, 1 reply; 91+ messages in thread
From: Hao Wu @ 2020-09-26 23:10 UTC (permalink / raw)
  To: James Bottomley
  Cc: peterhuewe, jarkko.sakkinen, jgg, arnd, gregkh, Hamza Attak,
	nayna, why2jjj.linux, zohar, linux-integrity, Paul Menzel,
	Ken Goldman, Seungyeop Han, Shrihari Kalkar, Anish Jhaveri

Resending following email in plaintext.

----

Hi James,

Thanks for following up.

We have actually tried change 
TPM_TIMEOUT_USECS_MIN / TPM_TIMEOUT_USECS_MAX 
according to https://patchwork.kernel.org/patch/10520247/
It does not solve the problem for ATMEL chip. The chips facing crash is 
not experimental, but happens commonly in 
the production systems we and our customers are using.
It is widely found in Cisco 220 / 240 systems which are using
Ateml chips.

Thanks
Hao  

> On Sep 26, 2020, at 3:57 PM, James Bottomley <James.Bottomley@hansenpartnership.com> wrote:
> 
> On Sat, 2020-09-26 at 15:31 -0700, Hao Wu wrote:
>> Since kernel 4.14, we fixed the TPM sleep logic
>> from msleep to usleep_range, so that the TPM
>> sleeps exactly with TPM_TIMEOUT (=5ms) afterward.
>> Before that fix, msleep(5) actually sleeps for
>> around 15ms.
>> The fix is 
>> https://github.com/torvalds/linux/commit/9f3fc7bcddcb51234e23494531f93ab60475e1c3
>> 
>> That fix uncovered that the TPM_TIMEOUT was not properly
>> set previously. We recently found the TPM driver in kernel 4.14+
>> (including 5.9-rc4) crashes Atmel TPM chips with
>> too frequent TPM queries.
> 
> I've seen this with my nuvoton too ... although it seems to be because
> my chip is somewhat experimental (SW upgrade from 1.2 to 2.0).  The
> problem with your patch is it reintroduces the massive delays that
> msleep has ... that's why usleep was used.  The patch I use locally to
> fix this keeps usleep, can you try it (attached).
> 
> James
> 
> ---
> 
> From d40a8c7691a72de28ea66a78bd177db36a79710a Mon Sep 17 00:00:00 2001
> From: James Bottomley <James.Bottomley@HansenPartnership.com>
> Date: Wed, 11 Jul 2018 10:11:14 -0700
> Subject: [PATCH] tpm.h: increase poll timings to fix tpm_tis regression
> 
> tpm_tis regressed recently to the point where the TPM being driven by
> it falls off the bus and cannot be contacted after some hours of use.
> This is the failure trace:
> 
> jejb@jarvis:~> dmesg|grep tpm
> [    3.282605] tpm_tis MSFT0101:00: 2.0 TPM (device-id 0xFE, rev-id 2)
> [14566.626614] tpm tpm0: Operation Timed out
> [14566.626621] tpm tpm0: tpm2_load_context: failed with a system error -62
> [14568.626607] tpm tpm0: tpm_try_transmit: tpm_send: error -62
> [14570.626594] tpm tpm0: tpm_try_transmit: tpm_send: error -62
> [14570.626605] tpm tpm0: tpm2_load_context: failed with a system error -62
> [14572.626526] tpm tpm0: tpm_try_transmit: tpm_send: error -62
> [14577.710441] tpm tpm0: tpm_try_transmit: tpm_send: error -62
> ...
> 
> The problem is caused by a change that caused us to poke the TPM far
> more often to see if it's ready.  Apparently something about the bus
> its on and the TPM means that it crashes or falls off the bus if you
> poke it too often and once this happens, only a reboot will recover
> it.
> 
> The fix I've come up with is to adjust the timings so the TPM no
> longer falls of the bus.  Obviously, this fix works for my Nuvoton
> NPCT6xxx but that's the only TPM I've tested it with.
> 
> Fixes: 424eaf910c32 tpm: reduce polling time to usecs for even finer granularity
> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
> ---
> drivers/char/tpm/tpm.h | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/char/tpm/tpm.h b/drivers/char/tpm/tpm.h
> index 947d1db0a5cc..e4f4b98418ab 100644
> --- a/drivers/char/tpm/tpm.h
> +++ b/drivers/char/tpm/tpm.h
> @@ -41,8 +41,8 @@ enum tpm_timeout {
> 	TPM_TIMEOUT_RETRY = 100, /* msecs */
> 	TPM_TIMEOUT_RANGE_US = 300,	/* usecs */
> 	TPM_TIMEOUT_POLL = 1,	/* msecs */
> -	TPM_TIMEOUT_USECS_MIN = 100,      /* usecs */
> -	TPM_TIMEOUT_USECS_MAX = 500      /* usecs */
> +	TPM_TIMEOUT_USECS_MIN = 750,      /* usecs */
> +	TPM_TIMEOUT_USECS_MAX = 1000,      /* usecs */
> };
> 
> /* TPM addresses */
> -- 
> 2.26.2
> 
> 


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH] Fix Atmel TPM crash caused by too frequent queries
  2020-09-26 22:31 [PATCH] Fix " Hao Wu
@ 2020-09-26 22:57 ` James Bottomley
  2020-09-26 23:10   ` Hao Wu
  0 siblings, 1 reply; 91+ messages in thread
From: James Bottomley @ 2020-09-26 22:57 UTC (permalink / raw)
  To: Hao Wu, peterhuewe, jarkko.sakkinen, jgg, arnd, gregkh, hamza,
	james.l.morris, nayna, why2jjj.linux, zohar
  Cc: linux-integrity, pmenzel, kgold, seungyeop.han, shrihari.kalkar,
	anish.jhaveri

On Sat, 2020-09-26 at 15:31 -0700, Hao Wu wrote:
> Since kernel 4.14, we fixed the TPM sleep logic
> from msleep to usleep_range, so that the TPM
> sleeps exactly with TPM_TIMEOUT (=5ms) afterward.
> Before that fix, msleep(5) actually sleeps for
> around 15ms.
> The fix is 
> https://github.com/torvalds/linux/commit/9f3fc7bcddcb51234e23494531f93ab60475e1c3
> 
> That fix uncovered that the TPM_TIMEOUT was not properly
> set previously. We recently found the TPM driver in kernel 4.14+
> (including 5.9-rc4) crashes Atmel TPM chips with
> too frequent TPM queries.

I've seen this with my nuvoton too ... although it seems to be because
my chip is somewhat experimental (SW upgrade from 1.2 to 2.0).  The
problem with your patch is it reintroduces the massive delays that
msleep has ... that's why usleep was used.  The patch I use locally to
fix this keeps usleep, can you try it (attached).

James

---

From d40a8c7691a72de28ea66a78bd177db36a79710a Mon Sep 17 00:00:00 2001
From: James Bottomley <James.Bottomley@HansenPartnership.com>
Date: Wed, 11 Jul 2018 10:11:14 -0700
Subject: [PATCH] tpm.h: increase poll timings to fix tpm_tis regression

tpm_tis regressed recently to the point where the TPM being driven by
it falls off the bus and cannot be contacted after some hours of use.
This is the failure trace:

jejb@jarvis:~> dmesg|grep tpm
[    3.282605] tpm_tis MSFT0101:00: 2.0 TPM (device-id 0xFE, rev-id 2)
[14566.626614] tpm tpm0: Operation Timed out
[14566.626621] tpm tpm0: tpm2_load_context: failed with a system error -62
[14568.626607] tpm tpm0: tpm_try_transmit: tpm_send: error -62
[14570.626594] tpm tpm0: tpm_try_transmit: tpm_send: error -62
[14570.626605] tpm tpm0: tpm2_load_context: failed with a system error -62
[14572.626526] tpm tpm0: tpm_try_transmit: tpm_send: error -62
[14577.710441] tpm tpm0: tpm_try_transmit: tpm_send: error -62
...

The problem is caused by a change that caused us to poke the TPM far
more often to see if it's ready.  Apparently something about the bus
its on and the TPM means that it crashes or falls off the bus if you
poke it too often and once this happens, only a reboot will recover
it.

The fix I've come up with is to adjust the timings so the TPM no
longer falls of the bus.  Obviously, this fix works for my Nuvoton
NPCT6xxx but that's the only TPM I've tested it with.

Fixes: 424eaf910c32 tpm: reduce polling time to usecs for even finer granularity
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
---
 drivers/char/tpm/tpm.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/char/tpm/tpm.h b/drivers/char/tpm/tpm.h
index 947d1db0a5cc..e4f4b98418ab 100644
--- a/drivers/char/tpm/tpm.h
+++ b/drivers/char/tpm/tpm.h
@@ -41,8 +41,8 @@ enum tpm_timeout {
 	TPM_TIMEOUT_RETRY = 100, /* msecs */
 	TPM_TIMEOUT_RANGE_US = 300,	/* usecs */
 	TPM_TIMEOUT_POLL = 1,	/* msecs */
-	TPM_TIMEOUT_USECS_MIN = 100,      /* usecs */
-	TPM_TIMEOUT_USECS_MAX = 500      /* usecs */
+	TPM_TIMEOUT_USECS_MIN = 750,      /* usecs */
+	TPM_TIMEOUT_USECS_MAX = 1000,      /* usecs */
 };
 
 /* TPM addresses */
-- 
2.26.2



^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH] Fix Atmel TPM crash caused by too frequent queries
@ 2020-09-26 22:31 Hao Wu
  2020-09-26 22:57 ` James Bottomley
  0 siblings, 1 reply; 91+ messages in thread
From: Hao Wu @ 2020-09-26 22:31 UTC (permalink / raw)
  To: peterhuewe, jarkko.sakkinen, jgg, arnd, gregkh, hamza,
	james.l.morris, nayna, why2jjj.linux, zohar
  Cc: linux-integrity, pmenzel, kgold, seungyeop.han, shrihari.kalkar,
	anish.jhaveri, hao.wu

Since kernel 4.14, we fixed the TPM sleep logic
from msleep to usleep_range, so that the TPM
sleeps exactly with TPM_TIMEOUT (=5ms) afterward.
Before that fix, msleep(5) actually sleeps for
around 15ms.
The fix is https://github.com/torvalds/linux/commit/9f3fc7bcddcb51234e23494531f93ab60475e1c3

That fix uncovered that the TPM_TIMEOUT was not properly
set previously. We recently found the TPM driver in kernel 4.14+
(including 5.9-rc4) crashes Atmel TPM chips with
too frequent TPM queries.

The TPM crash signature is
```
$ tpm_sealdata -z
Tspi_Key_LoadKey failed: 0x00001087 - layer=tddl, code=0087 (135), I/O error

$ sudo dmesg | grep tpm0
[59154.665549] tpm tpm0: tpm_try_transmit: send(): error -62
[59154.809532] tpm tpm0: tpm_try_transmit: send(): error -62
```

From the error code "-62", it looks similar to another bug
https://patchwork.kernel.org/patch/10520247/
where the "TPM_TIMEOUT_USECS_MAX" and "TPM_TIMEOUT_USEC_MIN"
is too small, which causes TPM get queried too frequently,
and thus crashes.

We finally found the root cause is related to
the sleep timeout used in `wait_for_tpm_stat`

In the 4.16, commit
https://github.com/torvalds/linux/commit/cf151a9a44d52a63332e8e926234574fe5a5d784
uses `TPM_POLL_SLEEP` instead of TPM_TIMEOUT for `wait_for_tpm_stat`
and set `TPM_POLL_SLEEP` (1ms).

Since 4.18 commits
https://github.com/torvalds/linux/commit/59f5a6b07f6434efac0057dc2f303a96b871811b
https://github.com/torvalds/linux/commit/424eaf910c329ab06ad03a527ef45dcf6a328f00
further reduced the timeout in wait_for_tpm_stat to less than 1ms.

Our patch here defines a `TPM_TIMEOUT_WAIT_STAT` (15ms) just for
`wait_for_tpm_stat the` to fix the crash in Atmel chips,
but not introduce unnecessary performance regression
in other workflows.
15ms is the actual timeout TPM chips use before the 4.14 fix
from msleep to usleep_range. Thus fixed the crash.

Exploring smaller `TPM_TIMEOUT_WAIT_STAT` should be a separate
thing to revisit later. This patch meant to fix the regression
introduced since 4.14

Test Plan:
- Run fixed kernel on system with Atmel TPM chips
  and ensure crash does not happen
- Run fixed kernel on system with other TPM chips
  (IFX / WEC / STM) ensure not breakages from tpm-tool

Signed-off-by: Hao Wu <hao.wu@rubrik.com>
---
 drivers/char/tpm/tpm.h          | 1 +
 drivers/char/tpm/tpm_tis_core.c | 3 +--
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/char/tpm/tpm.h b/drivers/char/tpm/tpm.h
index 947d1db0a5cc..899097ae9756 100644
--- a/drivers/char/tpm/tpm.h
+++ b/drivers/char/tpm/tpm.h
@@ -41,6 +41,7 @@ enum tpm_timeout {
 	TPM_TIMEOUT_RETRY = 100, /* msecs */
 	TPM_TIMEOUT_RANGE_US = 300,	/* usecs */
 	TPM_TIMEOUT_POLL = 1,	/* msecs */
+	TPM_TIMEOUT_WAIT_STAT = 15,      /* msecs */
 	TPM_TIMEOUT_USECS_MIN = 100,      /* usecs */
 	TPM_TIMEOUT_USECS_MAX = 500      /* usecs */
 };
diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
index 65ab1b027949..8aa5eef10c28 100644
--- a/drivers/char/tpm/tpm_tis_core.c
+++ b/drivers/char/tpm/tpm_tis_core.c
@@ -80,8 +80,7 @@ static int wait_for_tpm_stat(struct tpm_chip *chip, u8 mask,
 		}
 	} else {
 		do {
-			usleep_range(TPM_TIMEOUT_USECS_MIN,
-				     TPM_TIMEOUT_USECS_MAX);
+			tpm_msleep(TPM_TIMEOUT_WAIT_STAT);
 			status = chip->ops->status(chip);
 			if ((status & mask) == mask)
 				return 0;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH] Fix Atmel TPM crash caused by too frequent queries
  2020-09-14  6:13 Hao Wu
  2020-09-14  6:17 ` Greg KH
@ 2020-09-15  2:52 ` Hao Wu
  1 sibling, 0 replies; 91+ messages in thread
From: Hao Wu @ 2020-09-15  2:52 UTC (permalink / raw)
  To: peterhuewe, jarkko.sakkinen, jgg, arnd, gregkh, hamza, james.l.morris
  Cc: linux-integrity, pmenzel, kgold, seungyeop.han, shrihari.kalkar,
	anish.jhaveri

Applied the fix on 5.9-rc4, but it looks like it is not enough to fix the issue. There should be other TPM timing changes since 4.15 (where we have tested and ensure it works). Need to look into it a bit to see what should be the fix in the master branch.

Thanks
Hao

> On Sep 13, 2020, at 11:13 PM, Hao Wu <hao.wu@rubrik.com> wrote:
> 
> Since kernel 4.14, we fixed the TPM sleep logic
> from msleep to usleep_range, so that the TPM
> sleeps exactly with TPM_TIMEOUT (=5ms) afterward.
> Before that fix, msleep(5) actually sleeps for
> around 15ms.
> The fix is https://github.com/torvalds/linux/commit/9f3fc7bcddcb51234e23494531f93ab60475e1c3
> 
> That fix uncovered that the TPM_TIMEOUT was not properly
> set previously. We recently found the TPM driver in kernel 4.14+
> (including 5.9-rc4) crashes Atmel TPM chips with
> too frequent TPM queries.
> 
> The TPM crash signature is
> ```
> $ tpm_sealdata -z
> Tspi_Key_LoadKey failed: 0x00001087 - layer=tddl, code=0087 (135), I/O error
> 
> $ sudo dmesg | grep tpm0
> [59154.665549] tpm tpm0: tpm_try_transmit: send(): error -62
> [59154.809532] tpm tpm0: tpm_try_transmit: send(): error -62
> ```
> 
> From the error code "-62", it looks similar to another bug
> https://patchwork.kernel.org/patch/10520247/
> where the "TPM_TIMEOUT_USECS_MAX" and "TPM_TIMEOUT_USEC_MIN"
> is too small, which causes TPM get queried too frequently,
> and thus crashes.
> 
> This patch fix the TPM_TIMEOUT to 15ms which was
> the actual timeout TPM chips use before the fix
> from msleep to usleep_range. Thus fixed the crash.
> 
> Test Plan:
> - Run fixed kernel on system with Atmel TPM chips
>  and ensure crash does not happen
> - Run fixed kernel on system with other TPM chips
>  (IFX / WEC / STM) ensure not breakages from tpm-tool
> ---
> drivers/char/tpm/tpm.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/char/tpm/tpm.h b/drivers/char/tpm/tpm.h
> index 947d1db0a5cc..73259ac0a997 100644
> --- a/drivers/char/tpm/tpm.h
> +++ b/drivers/char/tpm/tpm.h
> @@ -37,7 +37,7 @@
> #define TPM_RETRY		50
> 
> enum tpm_timeout {
> -	TPM_TIMEOUT = 5,	/* msecs */
> +	TPM_TIMEOUT = 15,	/* msecs */
> 	TPM_TIMEOUT_RETRY = 100, /* msecs */
> 	TPM_TIMEOUT_RANGE_US = 300,	/* usecs */
> 	TPM_TIMEOUT_POLL = 1,	/* msecs */
> -- 
> 2.17.1
> 


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH] Fix Atmel TPM crash caused by too frequent queries
  2020-09-14  6:13 Hao Wu
@ 2020-09-14  6:17 ` Greg KH
  2020-09-15  2:52 ` Hao Wu
  1 sibling, 0 replies; 91+ messages in thread
From: Greg KH @ 2020-09-14  6:17 UTC (permalink / raw)
  To: Hao Wu
  Cc: peterhuewe, jarkko.sakkinen, jgg, arnd, hamza, james.l.morris,
	linux-integrity, pmenzel, kgold, seungyeop.han, shrihari.kalkar,
	anish.jhaveri

On Sun, Sep 13, 2020 at 11:13:43PM -0700, Hao Wu wrote:
> Since kernel 4.14, we fixed the TPM sleep logic
> from msleep to usleep_range, so that the TPM
> sleeps exactly with TPM_TIMEOUT (=5ms) afterward.
> Before that fix, msleep(5) actually sleeps for
> around 15ms.
> The fix is https://github.com/torvalds/linux/commit/9f3fc7bcddcb51234e23494531f93ab60475e1c3
> 
> That fix uncovered that the TPM_TIMEOUT was not properly
> set previously. We recently found the TPM driver in kernel 4.14+
> (including 5.9-rc4) crashes Atmel TPM chips with
> too frequent TPM queries.
> 
> The TPM crash signature is
> ```
> $ tpm_sealdata -z
> Tspi_Key_LoadKey failed: 0x00001087 - layer=tddl, code=0087 (135), I/O error
> 
> $ sudo dmesg | grep tpm0
> [59154.665549] tpm tpm0: tpm_try_transmit: send(): error -62
> [59154.809532] tpm tpm0: tpm_try_transmit: send(): error -62
> ```
> 
> >From the error code "-62", it looks similar to another bug
> https://patchwork.kernel.org/patch/10520247/
> where the "TPM_TIMEOUT_USECS_MAX" and "TPM_TIMEOUT_USEC_MIN"
> is too small, which causes TPM get queried too frequently,
> and thus crashes.
> 
> This patch fix the TPM_TIMEOUT to 15ms which was
> the actual timeout TPM chips use before the fix
> from msleep to usleep_range. Thus fixed the crash.
> 
> Test Plan:
> - Run fixed kernel on system with Atmel TPM chips
>   and ensure crash does not happen
> - Run fixed kernel on system with other TPM chips
>   (IFX / WEC / STM) ensure not breakages from tpm-tool
> ---
>  drivers/char/tpm/tpm.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/char/tpm/tpm.h b/drivers/char/tpm/tpm.h
> index 947d1db0a5cc..73259ac0a997 100644
> --- a/drivers/char/tpm/tpm.h
> +++ b/drivers/char/tpm/tpm.h
> @@ -37,7 +37,7 @@
>  #define TPM_RETRY		50
>  
>  enum tpm_timeout {
> -	TPM_TIMEOUT = 5,	/* msecs */
> +	TPM_TIMEOUT = 15,	/* msecs */
>  	TPM_TIMEOUT_RETRY = 100, /* msecs */
>  	TPM_TIMEOUT_RANGE_US = 300,	/* usecs */
>  	TPM_TIMEOUT_POLL = 1,	/* msecs */
> -- 
> 2.17.1
> 

Hi,

This is the friendly patch-bot of Greg Kroah-Hartman.  You have sent him
a patch that has triggered this response.  He used to manually respond
to these common problems, but in order to save his sanity (he kept
writing the same thing over and over, yet to different people), I was
created.  Hopefully you will not take offence and will fix the problem
in your patch and resubmit it so that it can be accepted into the Linux
kernel tree.

You are receiving this message because of the following common error(s)
as indicated below:

- Your patch does not have a Signed-off-by: line.  Please read the
  kernel file, Documentation/SubmittingPatches and resend it after
  adding that line.  Note, the line needs to be in the body of the
  email, before the patch, not at the bottom of the patch or in the
  email signature.

If you wish to discuss this problem further, or you have questions about
how to resolve this issue, please feel free to respond to this email and
Greg will reply once he has dug out from the pending patches received
from other developers.

thanks,

greg k-h's patch email bot

^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH] Fix Atmel TPM crash caused by too frequent queries
@ 2020-09-14  6:13 Hao Wu
  2020-09-14  6:17 ` Greg KH
  2020-09-15  2:52 ` Hao Wu
  0 siblings, 2 replies; 91+ messages in thread
From: Hao Wu @ 2020-09-14  6:13 UTC (permalink / raw)
  To: peterhuewe, jarkko.sakkinen, jgg, arnd, gregkh, hamza, james.l.morris
  Cc: linux-integrity, pmenzel, kgold, seungyeop.han, shrihari.kalkar,
	hao.wu, anish.jhaveri

Since kernel 4.14, we fixed the TPM sleep logic
from msleep to usleep_range, so that the TPM
sleeps exactly with TPM_TIMEOUT (=5ms) afterward.
Before that fix, msleep(5) actually sleeps for
around 15ms.
The fix is https://github.com/torvalds/linux/commit/9f3fc7bcddcb51234e23494531f93ab60475e1c3

That fix uncovered that the TPM_TIMEOUT was not properly
set previously. We recently found the TPM driver in kernel 4.14+
(including 5.9-rc4) crashes Atmel TPM chips with
too frequent TPM queries.

The TPM crash signature is
```
$ tpm_sealdata -z
Tspi_Key_LoadKey failed: 0x00001087 - layer=tddl, code=0087 (135), I/O error

$ sudo dmesg | grep tpm0
[59154.665549] tpm tpm0: tpm_try_transmit: send(): error -62
[59154.809532] tpm tpm0: tpm_try_transmit: send(): error -62
```

From the error code "-62", it looks similar to another bug
https://patchwork.kernel.org/patch/10520247/
where the "TPM_TIMEOUT_USECS_MAX" and "TPM_TIMEOUT_USEC_MIN"
is too small, which causes TPM get queried too frequently,
and thus crashes.

This patch fix the TPM_TIMEOUT to 15ms which was
the actual timeout TPM chips use before the fix
from msleep to usleep_range. Thus fixed the crash.

Test Plan:
- Run fixed kernel on system with Atmel TPM chips
  and ensure crash does not happen
- Run fixed kernel on system with other TPM chips
  (IFX / WEC / STM) ensure not breakages from tpm-tool
---
 drivers/char/tpm/tpm.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/char/tpm/tpm.h b/drivers/char/tpm/tpm.h
index 947d1db0a5cc..73259ac0a997 100644
--- a/drivers/char/tpm/tpm.h
+++ b/drivers/char/tpm/tpm.h
@@ -37,7 +37,7 @@
 #define TPM_RETRY		50
 
 enum tpm_timeout {
-	TPM_TIMEOUT = 5,	/* msecs */
+	TPM_TIMEOUT = 15,	/* msecs */
 	TPM_TIMEOUT_RETRY = 100, /* msecs */
 	TPM_TIMEOUT_RANGE_US = 300,	/* usecs */
 	TPM_TIMEOUT_POLL = 1,	/* msecs */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 91+ messages in thread

end of thread, other threads:[~2021-09-08  8:33 UTC | newest]

Thread overview: 91+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-20 23:18 [PATCH] Fix Atmel TPM crash caused by too frequent queries Hao Wu
2021-06-23 13:35 ` Jarkko Sakkinen
2021-06-24  5:49   ` Hao Wu
2021-06-29 20:06     ` Jarkko Sakkinen
2021-06-30  4:27       ` Hao Wu
2021-06-24  5:33 ` Hao Wu
2021-06-29 20:07   ` Jarkko Sakkinen
2021-06-30  4:22   ` [PATCH] tpm: fix ATMEL " Hao Wu
2021-07-02  6:35     ` Jarkko Sakkinen
2021-07-02  7:12       ` Greg KH
2021-07-02  7:33       ` Hao Wu
2021-07-02  7:35         ` Hao Wu
2021-07-02  7:45         ` Jarkko Sakkinen
2021-07-02  7:59           ` Hao Wu
2021-07-02  8:42             ` Jarkko Sakkinen
2021-07-02 11:57               ` Jarkko Sakkinen
2021-07-02 19:16                 ` Hao Wu
2021-07-05  5:19                   ` Jarkko Sakkinen
2021-07-05  5:29                     ` Hao Wu
2021-07-04  0:07     ` Hao Wu
2021-07-05  7:15       ` Jarkko Sakkinen
2021-07-05 23:09         ` Hao Wu
2021-07-06 12:34           ` Mimi Zohar
2021-07-07  4:18             ` Hao Wu
2021-07-07  4:34               ` Hao Wu
2021-07-07  4:31     ` [PATCH v2] " Hao Wu
2021-07-07  9:24       ` Jarkko Sakkinen
2021-07-07 18:28         ` Hao Wu
2021-07-07 21:10           ` Jarkko Sakkinen
2021-07-09  4:43             ` Hao Wu
2021-07-09  4:40     ` [PATCH v2] tpm: fix Atmel " Hao Wu
2021-07-09 17:47       ` Jarkko Sakkinen
2021-07-09 19:23         ` Hao Wu
2021-07-11  7:37           ` Hao Wu
2021-07-16  5:30             ` Hao Wu
2021-07-11  7:51       ` [PATCH v3] " Hao Wu
2021-07-27  2:46         ` Jarkko Sakkinen
2021-07-27  3:40           ` Hao Wu
2021-08-14 22:25         ` [PATCH v4] " Hao Wu
2021-08-26  5:38           ` Hao Wu
2021-08-26 16:24             ` Jarkko Sakkinen
2021-08-27  0:35               ` Hao Wu
2021-09-04 21:14                 ` Hao Wu
2021-09-04 23:15                   ` Hao Wu
2021-09-05  3:51           ` [PATCH v5] " Hao Wu
2021-09-07 17:43             ` Jarkko Sakkinen
2021-09-08  8:33               ` Hao Wu
  -- strict thread matches above, loose matches on Subject: below --
2020-09-26 22:31 [PATCH] Fix " Hao Wu
2020-09-26 22:57 ` James Bottomley
2020-09-26 23:10   ` Hao Wu
2020-09-27 18:25     ` James Bottomley
2020-09-28  0:11       ` Hao Wu
2020-09-28  0:15         ` Hao Wu
2020-09-28  1:22         ` James Bottomley
2020-09-28  5:59           ` Hao Wu
2020-09-28 22:11             ` James Bottomley
2020-09-29  4:46               ` Hao Wu
2020-09-30  2:16               ` Jarkko Sakkinen
2020-09-30 14:54                 ` James Bottomley
2020-09-30 15:37                   ` Jarkko Sakkinen
2020-09-30 20:48                     ` James Bottomley
2020-09-30 21:09                       ` Jarkko Sakkinen
2020-09-30 22:31                         ` James Bottomley
2020-10-01  1:50                           ` Jarkko Sakkinen
2020-10-01  4:53                             ` James Bottomley
2020-10-01 18:15                               ` Nayna
2020-10-01 18:32                                 ` James Bottomley
2020-10-01 23:04                                   ` Jarkko Sakkinen
2020-10-17  6:11                                     ` Hao Wu
2020-10-18  5:09                                       ` Jarkko Sakkinen
2020-10-18  5:20                                         ` Hao Wu
2020-11-14  4:39                                           ` Hao Wu
2020-11-18 21:11                                             ` Jarkko Sakkinen
2020-11-18 23:23                                               ` Hao Wu
2021-05-09  6:18                                               ` Hao Wu
2021-05-09  6:31                                                 ` Hao Wu
2021-05-10  2:17                                                   ` Mimi Zohar
2021-05-10  3:15                                                     ` Hao Wu
2021-05-10 17:28                                                     ` Jarkko Sakkinen
2020-09-28  1:08       ` Jarkko Sakkinen
2020-09-28  6:03         ` Hao Wu
2020-09-28 14:16           ` Jarkko Sakkinen
2020-09-28 17:49             ` Hao Wu
2020-09-28 19:47               ` Jarkko Sakkinen
2020-09-28 20:27                 ` Hao Wu
2020-09-30  2:11                   ` Jarkko Sakkinen
2020-09-30  3:41                     ` Hao Wu
     [not found]                       ` <EA1EE8F8-F054-4E1B-B830-231398D33CB8@rubrik.com>
2020-10-01 14:16                         ` Mimi Zohar
2020-09-14  6:13 Hao Wu
2020-09-14  6:17 ` Greg KH
2020-09-15  2:52 ` Hao Wu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).