linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] tpm_tis: Disable interrupts on ThinkPad T490s
@ 2020-10-15 21:44 Jerry Snitselaar
  2020-10-15 22:01 ` Jerry Snitselaar
                   ` (3 more replies)
  0 siblings, 4 replies; 22+ messages in thread
From: Jerry Snitselaar @ 2020-10-15 21:44 UTC (permalink / raw)
  To: linux-integrity
  Cc: linux-kernel, jarkko, Peter Huewe, Jason Gunthorpe,
	Hans de Goede, James Bottomley

There is a misconfiguration in the bios of the gpio pin used for the
interrupt in the T490s. When interrupts are enabled in the tpm_tis
driver code this results in an interrupt storm. This was initially
reported when we attempted to enable the interrupt code in the tpm_tis
driver, which previously wasn't setting a flag to enable it. Due to
the reports of the interrupt storm that code was reverted and we went back
to polling instead of using interrupts. Now that we know the T490s problem
is a firmware issue, add code to check if the system is a T490s and
disable interrupts if that is the case. This will allow us to enable
interrupts for everyone else. If the user has a fixed bios they can
force the enabling of interrupts with tpm_tis.interrupts=1 on the
kernel command line.

Cc: jarkko@kernel.org
Cc: Peter Huewe <peterhuewe@gmx.de>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Jerry Snitselaar <jsnitsel@redhat.com>
---
 drivers/char/tpm/tpm_tis.c | 29 +++++++++++++++++++++++++++--
 1 file changed, 27 insertions(+), 2 deletions(-)

diff --git a/drivers/char/tpm/tpm_tis.c b/drivers/char/tpm/tpm_tis.c
index 0b214963539d..4ed6e660273a 100644
--- a/drivers/char/tpm/tpm_tis.c
+++ b/drivers/char/tpm/tpm_tis.c
@@ -27,6 +27,7 @@
 #include <linux/of.h>
 #include <linux/of_device.h>
 #include <linux/kernel.h>
+#include <linux/dmi.h>
 #include "tpm.h"
 #include "tpm_tis_core.h"
 
@@ -49,8 +50,8 @@ static inline struct tpm_tis_tcg_phy *to_tpm_tis_tcg_phy(struct tpm_tis_data *da
 	return container_of(data, struct tpm_tis_tcg_phy, priv);
 }
 
-static bool interrupts = true;
-module_param(interrupts, bool, 0444);
+static int interrupts = -1;
+module_param(interrupts, int, 0444);
 MODULE_PARM_DESC(interrupts, "Enable interrupts");
 
 static bool itpm;
@@ -63,6 +64,28 @@ module_param(force, bool, 0444);
 MODULE_PARM_DESC(force, "Force device probe rather than using ACPI entry");
 #endif
 
+static int tpm_tis_disable_irq(const struct dmi_system_id *d)
+{
+	if (interrupts == -1) {
+		pr_notice("tpm_tis: %s detected: disabling interrupts.\n", d->ident);
+		interrupts = 0;
+	}
+
+	return 0;
+}
+
+static const struct dmi_system_id tpm_tis_dmi_table[] = {
+	{
+		.callback = tpm_tis_disable_irq,
+		.ident = "ThinkPad T490s",
+		.matches = {
+			DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"),
+			DMI_MATCH(DMI_PRODUCT_VERSION, "ThinkPad T490s"),
+		},
+	},
+	{}
+};
+
 #if defined(CONFIG_PNP) && defined(CONFIG_ACPI)
 static int has_hid(struct acpi_device *dev, const char *hid)
 {
@@ -192,6 +215,8 @@ static int tpm_tis_init(struct device *dev, struct tpm_info *tpm_info)
 	int irq = -1;
 	int rc;
 
+	dmi_check_system(tpm_tis_dmi_table);
+
 	rc = check_acpi_tpm2(dev);
 	if (rc)
 		return rc;
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH] tpm_tis: Disable interrupts on ThinkPad T490s
  2020-10-15 21:44 [PATCH] tpm_tis: Disable interrupts on ThinkPad T490s Jerry Snitselaar
@ 2020-10-15 22:01 ` Jerry Snitselaar
  2020-10-15 22:39 ` Matthew Garrett
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 22+ messages in thread
From: Jerry Snitselaar @ 2020-10-15 22:01 UTC (permalink / raw)
  To: James Bottomley
  Cc: linux-kernel, jarkko, Peter Huewe, Jason Gunthorpe,
	Hans de Goede, linux-integrity


James should this get tacked on the end of your patchset?

Regards,
Jerry


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] tpm_tis: Disable interrupts on ThinkPad T490s
  2020-10-15 21:44 [PATCH] tpm_tis: Disable interrupts on ThinkPad T490s Jerry Snitselaar
  2020-10-15 22:01 ` Jerry Snitselaar
@ 2020-10-15 22:39 ` Matthew Garrett
  2020-10-16  6:12   ` Hans de Goede
  2020-11-19  6:36   ` Jerry Snitselaar
  2020-10-16  6:10 ` Hans de Goede
  2020-10-18 21:11 ` Jarkko Sakkinen
  3 siblings, 2 replies; 22+ messages in thread
From: Matthew Garrett @ 2020-10-15 22:39 UTC (permalink / raw)
  To: Jerry Snitselaar
  Cc: linux-integrity, Linux Kernel Mailing List, jarkko, Peter Huewe,
	Jason Gunthorpe, Hans de Goede, James Bottomley

On Thu, Oct 15, 2020 at 2:44 PM Jerry Snitselaar <jsnitsel@redhat.com> wrote:
>
> There is a misconfiguration in the bios of the gpio pin used for the
> interrupt in the T490s. When interrupts are enabled in the tpm_tis
> driver code this results in an interrupt storm. This was initially
> reported when we attempted to enable the interrupt code in the tpm_tis
> driver, which previously wasn't setting a flag to enable it. Due to
> the reports of the interrupt storm that code was reverted and we went back
> to polling instead of using interrupts. Now that we know the T490s problem
> is a firmware issue, add code to check if the system is a T490s and
> disable interrupts if that is the case. This will allow us to enable
> interrupts for everyone else. If the user has a fixed bios they can
> force the enabling of interrupts with tpm_tis.interrupts=1 on the
> kernel command line.

I think an implication of this is that systems haven't been
well-tested with interrupts enabled. In general when we've found a
firmware issue in one place it ends up happening elsewhere as well, so
it wouldn't surprise me if there are other machines that will also be
unhappy with interrupts enabled. Would it be possible to automatically
detect this case (eg, if we get more than a certain number of
interrupts in a certain timeframe immediately after enabling the
interrupt) and automatically fall back to polling in that case? It
would also mean that users with fixed firmware wouldn't need to pass a
parameter.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] tpm_tis: Disable interrupts on ThinkPad T490s
  2020-10-15 21:44 [PATCH] tpm_tis: Disable interrupts on ThinkPad T490s Jerry Snitselaar
  2020-10-15 22:01 ` Jerry Snitselaar
  2020-10-15 22:39 ` Matthew Garrett
@ 2020-10-16  6:10 ` Hans de Goede
  2020-10-18 21:11 ` Jarkko Sakkinen
  3 siblings, 0 replies; 22+ messages in thread
From: Hans de Goede @ 2020-10-16  6:10 UTC (permalink / raw)
  To: Jerry Snitselaar, linux-integrity
  Cc: linux-kernel, jarkko, Peter Huewe, Jason Gunthorpe, James Bottomley

Hi,

On 10/15/20 11:44 PM, Jerry Snitselaar wrote:
> There is a misconfiguration in the bios of the gpio pin used for the
> interrupt in the T490s. When interrupts are enabled in the tpm_tis
> driver code this results in an interrupt storm. This was initially
> reported when we attempted to enable the interrupt code in the tpm_tis
> driver, which previously wasn't setting a flag to enable it. Due to
> the reports of the interrupt storm that code was reverted and we went back
> to polling instead of using interrupts. Now that we know the T490s problem
> is a firmware issue, add code to check if the system is a T490s and
> disable interrupts if that is the case. This will allow us to enable
> interrupts for everyone else. If the user has a fixed bios they can
> force the enabling of interrupts with tpm_tis.interrupts=1 on the
> kernel command line.
> 
> Cc: jarkko@kernel.org
> Cc: Peter Huewe <peterhuewe@gmx.de>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Hans de Goede <hdegoede@redhat.com>
> Reviewed-by: James Bottomley <James.Bottomley@HansenPartnership.com>
> Signed-off-by: Jerry Snitselaar <jsnitsel@redhat.com>

Patch looks good to me:

Reviewed-by: Hans de Goede <hdegoede@redhat.com>

Regards,

Hans


> ---
>   drivers/char/tpm/tpm_tis.c | 29 +++++++++++++++++++++++++++--
>   1 file changed, 27 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/char/tpm/tpm_tis.c b/drivers/char/tpm/tpm_tis.c
> index 0b214963539d..4ed6e660273a 100644
> --- a/drivers/char/tpm/tpm_tis.c
> +++ b/drivers/char/tpm/tpm_tis.c
> @@ -27,6 +27,7 @@
>   #include <linux/of.h>
>   #include <linux/of_device.h>
>   #include <linux/kernel.h>
> +#include <linux/dmi.h>
>   #include "tpm.h"
>   #include "tpm_tis_core.h"
>   
> @@ -49,8 +50,8 @@ static inline struct tpm_tis_tcg_phy *to_tpm_tis_tcg_phy(struct tpm_tis_data *da
>   	return container_of(data, struct tpm_tis_tcg_phy, priv);
>   }
>   
> -static bool interrupts = true;
> -module_param(interrupts, bool, 0444);
> +static int interrupts = -1;
> +module_param(interrupts, int, 0444);
>   MODULE_PARM_DESC(interrupts, "Enable interrupts");
>   
>   static bool itpm;
> @@ -63,6 +64,28 @@ module_param(force, bool, 0444);
>   MODULE_PARM_DESC(force, "Force device probe rather than using ACPI entry");
>   #endif
>   
> +static int tpm_tis_disable_irq(const struct dmi_system_id *d)
> +{
> +	if (interrupts == -1) {
> +		pr_notice("tpm_tis: %s detected: disabling interrupts.\n", d->ident);
> +		interrupts = 0;
> +	}
> +
> +	return 0;
> +}
> +
> +static const struct dmi_system_id tpm_tis_dmi_table[] = {
> +	{
> +		.callback = tpm_tis_disable_irq,
> +		.ident = "ThinkPad T490s",
> +		.matches = {
> +			DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"),
> +			DMI_MATCH(DMI_PRODUCT_VERSION, "ThinkPad T490s"),
> +		},
> +	},
> +	{}
> +};
> +
>   #if defined(CONFIG_PNP) && defined(CONFIG_ACPI)
>   static int has_hid(struct acpi_device *dev, const char *hid)
>   {
> @@ -192,6 +215,8 @@ static int tpm_tis_init(struct device *dev, struct tpm_info *tpm_info)
>   	int irq = -1;
>   	int rc;
>   
> +	dmi_check_system(tpm_tis_dmi_table);
> +
>   	rc = check_acpi_tpm2(dev);
>   	if (rc)
>   		return rc;
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] tpm_tis: Disable interrupts on ThinkPad T490s
  2020-10-15 22:39 ` Matthew Garrett
@ 2020-10-16  6:12   ` Hans de Goede
  2020-11-19  6:36   ` Jerry Snitselaar
  1 sibling, 0 replies; 22+ messages in thread
From: Hans de Goede @ 2020-10-16  6:12 UTC (permalink / raw)
  To: Matthew Garrett, Jerry Snitselaar
  Cc: linux-integrity, Linux Kernel Mailing List, jarkko, Peter Huewe,
	Jason Gunthorpe, James Bottomley

Hi,

On 10/16/20 12:39 AM, Matthew Garrett wrote:
> On Thu, Oct 15, 2020 at 2:44 PM Jerry Snitselaar <jsnitsel@redhat.com> wrote:
>>
>> There is a misconfiguration in the bios of the gpio pin used for the
>> interrupt in the T490s. When interrupts are enabled in the tpm_tis
>> driver code this results in an interrupt storm. This was initially
>> reported when we attempted to enable the interrupt code in the tpm_tis
>> driver, which previously wasn't setting a flag to enable it. Due to
>> the reports of the interrupt storm that code was reverted and we went back
>> to polling instead of using interrupts. Now that we know the T490s problem
>> is a firmware issue, add code to check if the system is a T490s and
>> disable interrupts if that is the case. This will allow us to enable
>> interrupts for everyone else. If the user has a fixed bios they can
>> force the enabling of interrupts with tpm_tis.interrupts=1 on the
>> kernel command line.
> 
> I think an implication of this is that systems haven't been
> well-tested with interrupts enabled. In general when we've found a
> firmware issue in one place it ends up happening elsewhere as well, so
> it wouldn't surprise me if there are other machines that will also be
> unhappy with interrupts enabled. Would it be possible to automatically
> detect this case (eg, if we get more than a certain number of
> interrupts in a certain timeframe immediately after enabling the
> interrupt) and automatically fall back to polling in that case? It
> would also mean that users with fixed firmware wouldn't need to pass a
> parameter.

IIRC then at least on the T490 the irq storm caused systems to not boot
in some cases. I guess if we detect the storm and disable the irq we might
fix that... OTOH this problem seems to only hit a certain generation of
Thinkpads so with some luck the denylist should not be too big and the denylist
approach should work.

Regards,

Hans


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] tpm_tis: Disable interrupts on ThinkPad T490s
  2020-10-15 21:44 [PATCH] tpm_tis: Disable interrupts on ThinkPad T490s Jerry Snitselaar
                   ` (2 preceding siblings ...)
  2020-10-16  6:10 ` Hans de Goede
@ 2020-10-18 21:11 ` Jarkko Sakkinen
  2020-10-18 21:14   ` Jarkko Sakkinen
  3 siblings, 1 reply; 22+ messages in thread
From: Jarkko Sakkinen @ 2020-10-18 21:11 UTC (permalink / raw)
  To: Jerry Snitselaar
  Cc: linux-integrity, linux-kernel, jarkko, Peter Huewe,
	Jason Gunthorpe, Hans de Goede, James Bottomley

On Thu, Oct 15, 2020 at 02:44:30PM -0700, Jerry Snitselaar wrote:
> There is a misconfiguration in the bios of the gpio pin used for the
> interrupt in the T490s. When interrupts are enabled in the tpm_tis
> driver code this results in an interrupt storm. This was initially
> reported when we attempted to enable the interrupt code in the tpm_tis
> driver, which previously wasn't setting a flag to enable it. Due to
> the reports of the interrupt storm that code was reverted and we went back
> to polling instead of using interrupts. Now that we know the T490s problem
> is a firmware issue, add code to check if the system is a T490s and
> disable interrupts if that is the case. This will allow us to enable
> interrupts for everyone else. If the user has a fixed bios they can
> force the enabling of interrupts with tpm_tis.interrupts=1 on the
> kernel command line.
> 
> Cc: jarkko@kernel.org
> Cc: Peter Huewe <peterhuewe@gmx.de>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Hans de Goede <hdegoede@redhat.com>
> Reviewed-by: James Bottomley <James.Bottomley@HansenPartnership.com>
> Signed-off-by: Jerry Snitselaar <jsnitsel@redhat.com>

Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>

I'll apply this and make it available in linux-next.

/Jarkko

> ---
>  drivers/char/tpm/tpm_tis.c | 29 +++++++++++++++++++++++++++--
>  1 file changed, 27 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/char/tpm/tpm_tis.c b/drivers/char/tpm/tpm_tis.c
> index 0b214963539d..4ed6e660273a 100644
> --- a/drivers/char/tpm/tpm_tis.c
> +++ b/drivers/char/tpm/tpm_tis.c
> @@ -27,6 +27,7 @@
>  #include <linux/of.h>
>  #include <linux/of_device.h>
>  #include <linux/kernel.h>
> +#include <linux/dmi.h>
>  #include "tpm.h"
>  #include "tpm_tis_core.h"
>  
> @@ -49,8 +50,8 @@ static inline struct tpm_tis_tcg_phy *to_tpm_tis_tcg_phy(struct tpm_tis_data *da
>  	return container_of(data, struct tpm_tis_tcg_phy, priv);
>  }
>  
> -static bool interrupts = true;
> -module_param(interrupts, bool, 0444);
> +static int interrupts = -1;
> +module_param(interrupts, int, 0444);
>  MODULE_PARM_DESC(interrupts, "Enable interrupts");
>  
>  static bool itpm;
> @@ -63,6 +64,28 @@ module_param(force, bool, 0444);
>  MODULE_PARM_DESC(force, "Force device probe rather than using ACPI entry");
>  #endif
>  
> +static int tpm_tis_disable_irq(const struct dmi_system_id *d)
> +{
> +	if (interrupts == -1) {
> +		pr_notice("tpm_tis: %s detected: disabling interrupts.\n", d->ident);
> +		interrupts = 0;
> +	}
> +
> +	return 0;
> +}
> +
> +static const struct dmi_system_id tpm_tis_dmi_table[] = {
> +	{
> +		.callback = tpm_tis_disable_irq,
> +		.ident = "ThinkPad T490s",
> +		.matches = {
> +			DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"),
> +			DMI_MATCH(DMI_PRODUCT_VERSION, "ThinkPad T490s"),
> +		},
> +	},
> +	{}
> +};
> +
>  #if defined(CONFIG_PNP) && defined(CONFIG_ACPI)
>  static int has_hid(struct acpi_device *dev, const char *hid)
>  {
> @@ -192,6 +215,8 @@ static int tpm_tis_init(struct device *dev, struct tpm_info *tpm_info)
>  	int irq = -1;
>  	int rc;
>  
> +	dmi_check_system(tpm_tis_dmi_table);
> +
>  	rc = check_acpi_tpm2(dev);
>  	if (rc)
>  		return rc;
> -- 
> 2.28.0
> 
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] tpm_tis: Disable interrupts on ThinkPad T490s
  2020-10-18 21:11 ` Jarkko Sakkinen
@ 2020-10-18 21:14   ` Jarkko Sakkinen
  0 siblings, 0 replies; 22+ messages in thread
From: Jarkko Sakkinen @ 2020-10-18 21:14 UTC (permalink / raw)
  To: Jerry Snitselaar
  Cc: linux-integrity, linux-kernel, jarkko, Peter Huewe,
	Jason Gunthorpe, Hans de Goede, James Bottomley

On Mon, Oct 19, 2020 at 12:11:44AM +0300, Jarkko Sakkinen wrote:
> On Thu, Oct 15, 2020 at 02:44:30PM -0700, Jerry Snitselaar wrote:
> > There is a misconfiguration in the bios of the gpio pin used for the
> > interrupt in the T490s. When interrupts are enabled in the tpm_tis
> > driver code this results in an interrupt storm. This was initially
> > reported when we attempted to enable the interrupt code in the tpm_tis
> > driver, which previously wasn't setting a flag to enable it. Due to
> > the reports of the interrupt storm that code was reverted and we went back
> > to polling instead of using interrupts. Now that we know the T490s problem
> > is a firmware issue, add code to check if the system is a T490s and
> > disable interrupts if that is the case. This will allow us to enable
> > interrupts for everyone else. If the user has a fixed bios they can
> > force the enabling of interrupts with tpm_tis.interrupts=1 on the
> > kernel command line.
> > 
> > Cc: jarkko@kernel.org
> > Cc: Peter Huewe <peterhuewe@gmx.de>
> > Cc: Jason Gunthorpe <jgg@ziepe.ca>
> > Cc: Hans de Goede <hdegoede@redhat.com>
> > Reviewed-by: James Bottomley <James.Bottomley@HansenPartnership.com>
> > Signed-off-by: Jerry Snitselaar <jsnitsel@redhat.com>
> 
> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
> 
> I'll apply this and make it available in linux-next.

Applied.

Thank you.

/Jarkko

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] tpm_tis: Disable interrupts on ThinkPad T490s
  2020-10-15 22:39 ` Matthew Garrett
  2020-10-16  6:12   ` Hans de Goede
@ 2020-11-19  6:36   ` Jerry Snitselaar
  2020-11-19 14:42     ` Hans de Goede
  2020-11-24  3:26     ` Jarkko Sakkinen
  1 sibling, 2 replies; 22+ messages in thread
From: Jerry Snitselaar @ 2020-11-19  6:36 UTC (permalink / raw)
  To: Matthew Garrett
  Cc: linux-integrity, Linux Kernel Mailing List, jarkko, Peter Huewe,
	Jason Gunthorpe, Hans de Goede, James Bottomley


Matthew Garrett @ 2020-10-15 15:39 MST:

> On Thu, Oct 15, 2020 at 2:44 PM Jerry Snitselaar <jsnitsel@redhat.com> wrote:
>>
>> There is a misconfiguration in the bios of the gpio pin used for the
>> interrupt in the T490s. When interrupts are enabled in the tpm_tis
>> driver code this results in an interrupt storm. This was initially
>> reported when we attempted to enable the interrupt code in the tpm_tis
>> driver, which previously wasn't setting a flag to enable it. Due to
>> the reports of the interrupt storm that code was reverted and we went back
>> to polling instead of using interrupts. Now that we know the T490s problem
>> is a firmware issue, add code to check if the system is a T490s and
>> disable interrupts if that is the case. This will allow us to enable
>> interrupts for everyone else. If the user has a fixed bios they can
>> force the enabling of interrupts with tpm_tis.interrupts=1 on the
>> kernel command line.
>
> I think an implication of this is that systems haven't been
> well-tested with interrupts enabled. In general when we've found a
> firmware issue in one place it ends up happening elsewhere as well, so
> it wouldn't surprise me if there are other machines that will also be
> unhappy with interrupts enabled. Would it be possible to automatically
> detect this case (eg, if we get more than a certain number of
> interrupts in a certain timeframe immediately after enabling the
> interrupt) and automatically fall back to polling in that case? It
> would also mean that users with fixed firmware wouldn't need to pass a
> parameter.

I believe Matthew is correct here. I found another system today
with completely different vendor for both the system and the tpm chip.
In addition another Lenovo model, the L490, has the issue.

This initial attempt at a solution like Matthew suggested works on
the system I found today, but I imagine it is all sorts of wrong.
In the 2 systems where I've seen it, there are about 100000 interrupts
in around 1.5 seconds, and then the irq code shuts down the interrupt
because they aren't being handled.


diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
index 49ae09ac604f..478e9d02a3fa 100644
--- a/drivers/char/tpm/tpm_tis_core.c
+++ b/drivers/char/tpm/tpm_tis_core.c
@@ -27,6 +27,11 @@
 #include "tpm.h"
 #include "tpm_tis_core.h"

+static unsigned int time_start = 0;
+static bool storm_check = true;
+static bool storm_killed = false;
+static u32 irqs_fired = 0;
+
 static void tpm_tis_clkrun_enable(struct tpm_chip *chip, bool value);

 static void tpm_tis_enable_interrupt(struct tpm_chip *chip, u8 mask)
@@ -464,25 +469,31 @@ static int tpm_tis_send_data(struct tpm_chip *chip, const u8 *buf, size_t len)
        return rc;
 }

-static void disable_interrupts(struct tpm_chip *chip)
+static void __disable_interrupts(struct tpm_chip *chip)
 {
        struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev);
        u32 intmask;
        int rc;

-       if (priv->irq == 0)
-               return;
-
        rc = tpm_tis_read32(priv, TPM_INT_ENABLE(priv->locality), &intmask);
        if (rc < 0)
                intmask = 0;

        intmask &= ~TPM_GLOBAL_INT_ENABLE;
        rc = tpm_tis_write32(priv, TPM_INT_ENABLE(priv->locality), intmask);
+       chip->flags &= ~TPM_CHIP_FLAG_IRQ;
+}
+
+static void disable_interrupts(struct tpm_chip *chip)
+{
+       struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev);

+       if (priv->irq == 0)
+               return;
+
+       __disable_interrupts(chip);
        devm_free_irq(chip->dev.parent, priv->irq, chip);
        priv->irq = 0;
-       chip->flags &= ~TPM_CHIP_FLAG_IRQ;
 }

 /*
@@ -528,6 +539,12 @@ static int tpm_tis_send(struct tpm_chip *chip, u8 *buf, size_t len)
        int rc, irq;
        struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev);

+       if (unlikely(storm_killed)) {
+               devm_free_irq(chip->dev.parent, priv->irq, chip);
+               priv->irq = 0;
+               storm_killed = false;
+       }
+
        if (!(chip->flags & TPM_CHIP_FLAG_IRQ) || priv->irq_tested)
                return tpm_tis_send_main(chip, buf, len);

@@ -748,6 +765,21 @@ static irqreturn_t tis_int_handler(int dummy, void *dev_id)
        u32 interrupt;
        int i, rc;

+       if (storm_check) {
+               irqs_fired++;
+
+               if (!time_start) {
+                       time_start = jiffies_to_msecs(jiffies);
+               } else if ((irqs_fired > 1000) && (jiffies_to_msecs(jiffies) - jiffies < 500)) {
+                       __disable_interrupts(chip);
+                       storm_check = false;
+                       storm_killed = true;
+                       return IRQ_HANDLED;
+               } else if ((jiffies_to_msecs(jiffies) - time_start > 500) && (irqs_fired < 1000)) {
+                       storm_check = false;
+               }
+       }
+
        rc = tpm_tis_read32(priv, TPM_INT_STATUS(priv->locality), &interrupt);
        if (rc < 0)
                return IRQ_NONE;


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH] tpm_tis: Disable interrupts on ThinkPad T490s
  2020-11-19  6:36   ` Jerry Snitselaar
@ 2020-11-19 14:42     ` Hans de Goede
  2020-11-19 17:05       ` Jerry Snitselaar
  2020-11-24  3:27       ` Jarkko Sakkinen
  2020-11-24  3:26     ` Jarkko Sakkinen
  1 sibling, 2 replies; 22+ messages in thread
From: Hans de Goede @ 2020-11-19 14:42 UTC (permalink / raw)
  To: Jerry Snitselaar, Matthew Garrett
  Cc: linux-integrity, Linux Kernel Mailing List, jarkko, Peter Huewe,
	Jason Gunthorpe, James Bottomley

Hi,

On 11/19/20 7:36 AM, Jerry Snitselaar wrote:
> 
> Matthew Garrett @ 2020-10-15 15:39 MST:
> 
>> On Thu, Oct 15, 2020 at 2:44 PM Jerry Snitselaar <jsnitsel@redhat.com> wrote:
>>>
>>> There is a misconfiguration in the bios of the gpio pin used for the
>>> interrupt in the T490s. When interrupts are enabled in the tpm_tis
>>> driver code this results in an interrupt storm. This was initially
>>> reported when we attempted to enable the interrupt code in the tpm_tis
>>> driver, which previously wasn't setting a flag to enable it. Due to
>>> the reports of the interrupt storm that code was reverted and we went back
>>> to polling instead of using interrupts. Now that we know the T490s problem
>>> is a firmware issue, add code to check if the system is a T490s and
>>> disable interrupts if that is the case. This will allow us to enable
>>> interrupts for everyone else. If the user has a fixed bios they can
>>> force the enabling of interrupts with tpm_tis.interrupts=1 on the
>>> kernel command line.
>>
>> I think an implication of this is that systems haven't been
>> well-tested with interrupts enabled. In general when we've found a
>> firmware issue in one place it ends up happening elsewhere as well, so
>> it wouldn't surprise me if there are other machines that will also be
>> unhappy with interrupts enabled. Would it be possible to automatically
>> detect this case (eg, if we get more than a certain number of
>> interrupts in a certain timeframe immediately after enabling the
>> interrupt) and automatically fall back to polling in that case? It
>> would also mean that users with fixed firmware wouldn't need to pass a
>> parameter.
> 
> I believe Matthew is correct here. I found another system today
> with completely different vendor for both the system and the tpm chip.
> In addition another Lenovo model, the L490, has the issue.
> 
> This initial attempt at a solution like Matthew suggested works on
> the system I found today, but I imagine it is all sorts of wrong.
> In the 2 systems where I've seen it, there are about 100000 interrupts
> in around 1.5 seconds, and then the irq code shuts down the interrupt
> because they aren't being handled.

Is that with your patch? The IRQ should be silenced as soon as
devm_free_irq(chip->dev.parent, priv->irq, chip); is called.

Depending on if we can get your storm-detection to work or not,
we might also choose to just never try to use the IRQ (at least on
x86 systems). AFAIK the TPM is never used for high-throughput stuff
so the polling overhead should not be a big deal (and I'm getting the feeling
that Windows always polls).

Regards,

Hans



> diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
> index 49ae09ac604f..478e9d02a3fa 100644
> --- a/drivers/char/tpm/tpm_tis_core.c
> +++ b/drivers/char/tpm/tpm_tis_core.c
> @@ -27,6 +27,11 @@
>  #include "tpm.h"
>  #include "tpm_tis_core.h"
> 
> +static unsigned int time_start = 0;
> +static bool storm_check = true;
> +static bool storm_killed = false;
> +static u32 irqs_fired = 0;
> +
>  static void tpm_tis_clkrun_enable(struct tpm_chip *chip, bool value);
> 
>  static void tpm_tis_enable_interrupt(struct tpm_chip *chip, u8 mask)
> @@ -464,25 +469,31 @@ static int tpm_tis_send_data(struct tpm_chip *chip, const u8 *buf, size_t len)
>         return rc;
>  }
> 
> -static void disable_interrupts(struct tpm_chip *chip)
> +static void __disable_interrupts(struct tpm_chip *chip)
>  {
>         struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev);
>         u32 intmask;
>         int rc;
> 
> -       if (priv->irq == 0)
> -               return;
> -
>         rc = tpm_tis_read32(priv, TPM_INT_ENABLE(priv->locality), &intmask);
>         if (rc < 0)
>                 intmask = 0;
> 
>         intmask &= ~TPM_GLOBAL_INT_ENABLE;
>         rc = tpm_tis_write32(priv, TPM_INT_ENABLE(priv->locality), intmask);
> +       chip->flags &= ~TPM_CHIP_FLAG_IRQ;
> +}
> +
> +static void disable_interrupts(struct tpm_chip *chip)
> +{
> +       struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev);
> 
> +       if (priv->irq == 0)
> +               return;
> +
> +       __disable_interrupts(chip);
>         devm_free_irq(chip->dev.parent, priv->irq, chip);
>         priv->irq = 0;
> -       chip->flags &= ~TPM_CHIP_FLAG_IRQ;
>  }
> 
>  /*
> @@ -528,6 +539,12 @@ static int tpm_tis_send(struct tpm_chip *chip, u8 *buf, size_t len)
>         int rc, irq;
>         struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev);
> 
> +       if (unlikely(storm_killed)) {
> +               devm_free_irq(chip->dev.parent, priv->irq, chip);
> +               priv->irq = 0;
> +               storm_killed = false;
> +       }
> +
>         if (!(chip->flags & TPM_CHIP_FLAG_IRQ) || priv->irq_tested)
>                 return tpm_tis_send_main(chip, buf, len);
> 
> @@ -748,6 +765,21 @@ static irqreturn_t tis_int_handler(int dummy, void *dev_id)
>         u32 interrupt;
>         int i, rc;
> 
> +       if (storm_check) {
> +               irqs_fired++;
> +
> +               if (!time_start) {
> +                       time_start = jiffies_to_msecs(jiffies);
> +               } else if ((irqs_fired > 1000) && (jiffies_to_msecs(jiffies) - jiffies < 500)) {
> +                       __disable_interrupts(chip);
> +                       storm_check = false;
> +                       storm_killed = true;
> +                       return IRQ_HANDLED;
> +               } else if ((jiffies_to_msecs(jiffies) - time_start > 500) && (irqs_fired < 1000)) {
> +                       storm_check = false;
> +               }
> +       }
> +
>         rc = tpm_tis_read32(priv, TPM_INT_STATUS(priv->locality), &interrupt);
>         if (rc < 0)
>                 return IRQ_NONE;
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] tpm_tis: Disable interrupts on ThinkPad T490s
  2020-11-19 14:42     ` Hans de Goede
@ 2020-11-19 17:05       ` Jerry Snitselaar
  2020-11-23 12:19         ` Hans de Goede
  2020-11-24  3:27       ` Jarkko Sakkinen
  1 sibling, 1 reply; 22+ messages in thread
From: Jerry Snitselaar @ 2020-11-19 17:05 UTC (permalink / raw)
  To: Hans de Goede
  Cc: Matthew Garrett, linux-integrity, Linux Kernel Mailing List,
	jarkko, Peter Huewe, Jason Gunthorpe, James Bottomley


Hans de Goede @ 2020-11-19 07:42 MST:

> Hi,
>
> On 11/19/20 7:36 AM, Jerry Snitselaar wrote:
>> 
>> Matthew Garrett @ 2020-10-15 15:39 MST:
>> 
>>> On Thu, Oct 15, 2020 at 2:44 PM Jerry Snitselaar <jsnitsel@redhat.com> wrote:
>>>>
>>>> There is a misconfiguration in the bios of the gpio pin used for the
>>>> interrupt in the T490s. When interrupts are enabled in the tpm_tis
>>>> driver code this results in an interrupt storm. This was initially
>>>> reported when we attempted to enable the interrupt code in the tpm_tis
>>>> driver, which previously wasn't setting a flag to enable it. Due to
>>>> the reports of the interrupt storm that code was reverted and we went back
>>>> to polling instead of using interrupts. Now that we know the T490s problem
>>>> is a firmware issue, add code to check if the system is a T490s and
>>>> disable interrupts if that is the case. This will allow us to enable
>>>> interrupts for everyone else. If the user has a fixed bios they can
>>>> force the enabling of interrupts with tpm_tis.interrupts=1 on the
>>>> kernel command line.
>>>
>>> I think an implication of this is that systems haven't been
>>> well-tested with interrupts enabled. In general when we've found a
>>> firmware issue in one place it ends up happening elsewhere as well, so
>>> it wouldn't surprise me if there are other machines that will also be
>>> unhappy with interrupts enabled. Would it be possible to automatically
>>> detect this case (eg, if we get more than a certain number of
>>> interrupts in a certain timeframe immediately after enabling the
>>> interrupt) and automatically fall back to polling in that case? It
>>> would also mean that users with fixed firmware wouldn't need to pass a
>>> parameter.
>> 
>> I believe Matthew is correct here. I found another system today
>> with completely different vendor for both the system and the tpm chip.
>> In addition another Lenovo model, the L490, has the issue.
>> 
>> This initial attempt at a solution like Matthew suggested works on
>> the system I found today, but I imagine it is all sorts of wrong.
>> In the 2 systems where I've seen it, there are about 100000 interrupts
>> in around 1.5 seconds, and then the irq code shuts down the interrupt
>> because they aren't being handled.
>
> Is that with your patch? The IRQ should be silenced as soon as
> devm_free_irq(chip->dev.parent, priv->irq, chip); is called.
>

No that is just with James' patchset that enables interrupts for
tpm_tis. It looks like the irq is firing, but the tpm's int_status
register is clear, so the handler immediately returns IRQ_NONE. After
that happens 100000 times the core irq code shuts down the irq, but it
isn't released so I could still see the stats in /proc/interrupts.  With
my attempt below on top of that patchset it releases the irq. I had to
stick the check prior to it checking the int_status register because it
is cleared and the handler returns, and I couldn't do the devm_free_irq
directly in tis_int_handler, so I tried sticking it in tpm_tis_send
where the other odd irq testing code was already located. I'm not sure
if there is another place that would work better for calling the
devm_free_irq.

> Depending on if we can get your storm-detection to work or not,
> we might also choose to just never try to use the IRQ (at least on
> x86 systems). AFAIK the TPM is never used for high-throughput stuff
> so the polling overhead should not be a big deal (and I'm getting the feeling
> that Windows always polls).
>

I was wondering about Windows as well. In addition to the Lenovo systems
which the T490s had Nuvoton tpm, the system I found yesterday was a development
system we have from a partner with an Infineon tpm. Dan Williams has
seen it internally at Intel as well on some system.

> Regards,
>
> Hans
>
>
>
>> diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
>> index 49ae09ac604f..478e9d02a3fa 100644
>> --- a/drivers/char/tpm/tpm_tis_core.c
>> +++ b/drivers/char/tpm/tpm_tis_core.c
>> @@ -27,6 +27,11 @@
>>  #include "tpm.h"
>>  #include "tpm_tis_core.h"
>> 
>> +static unsigned int time_start = 0;
>> +static bool storm_check = true;
>> +static bool storm_killed = false;
>> +static u32 irqs_fired = 0;
>> +
>>  static void tpm_tis_clkrun_enable(struct tpm_chip *chip, bool value);
>> 
>>  static void tpm_tis_enable_interrupt(struct tpm_chip *chip, u8 mask)
>> @@ -464,25 +469,31 @@ static int tpm_tis_send_data(struct tpm_chip *chip, const u8 *buf, size_t len)
>>         return rc;
>>  }
>> 
>> -static void disable_interrupts(struct tpm_chip *chip)
>> +static void __disable_interrupts(struct tpm_chip *chip)
>>  {
>>         struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev);
>>         u32 intmask;
>>         int rc;
>> 
>> -       if (priv->irq == 0)
>> -               return;
>> -
>>         rc = tpm_tis_read32(priv, TPM_INT_ENABLE(priv->locality), &intmask);
>>         if (rc < 0)
>>                 intmask = 0;
>> 
>>         intmask &= ~TPM_GLOBAL_INT_ENABLE;
>>         rc = tpm_tis_write32(priv, TPM_INT_ENABLE(priv->locality), intmask);
>> +       chip->flags &= ~TPM_CHIP_FLAG_IRQ;
>> +}
>> +
>> +static void disable_interrupts(struct tpm_chip *chip)
>> +{
>> +       struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev);
>> 
>> +       if (priv->irq == 0)
>> +               return;
>> +
>> +       __disable_interrupts(chip);
>>         devm_free_irq(chip->dev.parent, priv->irq, chip);
>>         priv->irq = 0;
>> -       chip->flags &= ~TPM_CHIP_FLAG_IRQ;
>>  }
>> 
>>  /*
>> @@ -528,6 +539,12 @@ static int tpm_tis_send(struct tpm_chip *chip, u8 *buf, size_t len)
>>         int rc, irq;
>>         struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev);
>> 
>> +       if (unlikely(storm_killed)) {
>> +               devm_free_irq(chip->dev.parent, priv->irq, chip);
>> +               priv->irq = 0;
>> +               storm_killed = false;
>> +       }
>> +
>>         if (!(chip->flags & TPM_CHIP_FLAG_IRQ) || priv->irq_tested)
>>                 return tpm_tis_send_main(chip, buf, len);
>> 
>> @@ -748,6 +765,21 @@ static irqreturn_t tis_int_handler(int dummy, void *dev_id)
>>         u32 interrupt;
>>         int i, rc;
>> 
>> +       if (storm_check) {
>> +               irqs_fired++;
>> +
>> +               if (!time_start) {
>> +                       time_start = jiffies_to_msecs(jiffies);
>> +               } else if ((irqs_fired > 1000) && (jiffies_to_msecs(jiffies) - jiffies < 500)) {
>> +                       __disable_interrupts(chip);
>> +                       storm_check = false;
>> +                       storm_killed = true;
>> +                       return IRQ_HANDLED;
>> +               } else if ((jiffies_to_msecs(jiffies) - time_start > 500) && (irqs_fired < 1000)) {
>> +                       storm_check = false;
>> +               }
>> +       }
>> +
>>         rc = tpm_tis_read32(priv, TPM_INT_STATUS(priv->locality), &interrupt);
>>         if (rc < 0)
>>                 return IRQ_NONE;
>> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] tpm_tis: Disable interrupts on ThinkPad T490s
  2020-11-19 17:05       ` Jerry Snitselaar
@ 2020-11-23 12:19         ` Hans de Goede
  0 siblings, 0 replies; 22+ messages in thread
From: Hans de Goede @ 2020-11-23 12:19 UTC (permalink / raw)
  To: Jerry Snitselaar
  Cc: Matthew Garrett, linux-integrity, Linux Kernel Mailing List,
	jarkko, Peter Huewe, Jason Gunthorpe, James Bottomley

Hi,

On 11/19/20 6:05 PM, Jerry Snitselaar wrote:
> 
> Hans de Goede @ 2020-11-19 07:42 MST:
> 
>> Hi,
>>
>> On 11/19/20 7:36 AM, Jerry Snitselaar wrote:
>>>
>>> Matthew Garrett @ 2020-10-15 15:39 MST:
>>>
>>>> On Thu, Oct 15, 2020 at 2:44 PM Jerry Snitselaar <jsnitsel@redhat.com> wrote:
>>>>>
>>>>> There is a misconfiguration in the bios of the gpio pin used for the
>>>>> interrupt in the T490s. When interrupts are enabled in the tpm_tis
>>>>> driver code this results in an interrupt storm. This was initially
>>>>> reported when we attempted to enable the interrupt code in the tpm_tis
>>>>> driver, which previously wasn't setting a flag to enable it. Due to
>>>>> the reports of the interrupt storm that code was reverted and we went back
>>>>> to polling instead of using interrupts. Now that we know the T490s problem
>>>>> is a firmware issue, add code to check if the system is a T490s and
>>>>> disable interrupts if that is the case. This will allow us to enable
>>>>> interrupts for everyone else. If the user has a fixed bios they can
>>>>> force the enabling of interrupts with tpm_tis.interrupts=1 on the
>>>>> kernel command line.
>>>>
>>>> I think an implication of this is that systems haven't been
>>>> well-tested with interrupts enabled. In general when we've found a
>>>> firmware issue in one place it ends up happening elsewhere as well, so
>>>> it wouldn't surprise me if there are other machines that will also be
>>>> unhappy with interrupts enabled. Would it be possible to automatically
>>>> detect this case (eg, if we get more than a certain number of
>>>> interrupts in a certain timeframe immediately after enabling the
>>>> interrupt) and automatically fall back to polling in that case? It
>>>> would also mean that users with fixed firmware wouldn't need to pass a
>>>> parameter.
>>>
>>> I believe Matthew is correct here. I found another system today
>>> with completely different vendor for both the system and the tpm chip.
>>> In addition another Lenovo model, the L490, has the issue.
>>>
>>> This initial attempt at a solution like Matthew suggested works on
>>> the system I found today, but I imagine it is all sorts of wrong.
>>> In the 2 systems where I've seen it, there are about 100000 interrupts
>>> in around 1.5 seconds, and then the irq code shuts down the interrupt
>>> because they aren't being handled.
>>
>> Is that with your patch? The IRQ should be silenced as soon as
>> devm_free_irq(chip->dev.parent, priv->irq, chip); is called.
>>
> 
> No that is just with James' patchset that enables interrupts for
> tpm_tis. It looks like the irq is firing, but the tpm's int_status
> register is clear, so the handler immediately returns IRQ_NONE. After
> that happens 100000 times the core irq code shuts down the irq, but it
> isn't released so I could still see the stats in /proc/interrupts.

I see, yes I have seen this behavior on the X1C8 with a pre-production BIOS.

> With
> my attempt below on top of that patchset it releases the irq. I had to
> stick the check prior to it checking the int_status register because it
> is cleared and the handler returns,

Ack.

> and I couldn't do the devm_free_irq
> directly in tis_int_handler, so I tried sticking it in tpm_tis_send
> where the other odd irq testing code was already located. I'm not sure
> if there is another place that would work better for calling the
> devm_free_irq.

Adding it together with the other IRQ testing related code seems fine
to me.

>> Depending on if we can get your storm-detection to work or not,
>> we might also choose to just never try to use the IRQ (at least on
>> x86 systems). AFAIK the TPM is never used for high-throughput stuff
>> so the polling overhead should not be a big deal (and I'm getting the feeling
>> that Windows always polls).
>>
> 
> I was wondering about Windows as well. In addition to the Lenovo systems
> which the T490s had Nuvoton tpm, the system I found yesterday was a development
> system we have from a partner with an Infineon tpm. Dan Williams has
> seen it internally at Intel as well on some system.

Sounds to me like Windows never uses the IRQ, so we get the fun of finding
all the untested firmware bugs.

So if we cannot get this detection code to work reliable, maybe we should
just not use the IRQ at all, at least on X86 + ACPI systems?

Anyways lets try this storm-detection thing first, but I have the feeling
we should not spend too much time on this. Just outright disabling IRQ
support might be better.

REgards,

Hans


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] tpm_tis: Disable interrupts on ThinkPad T490s
  2020-11-19  6:36   ` Jerry Snitselaar
  2020-11-19 14:42     ` Hans de Goede
@ 2020-11-24  3:26     ` Jarkko Sakkinen
  2020-11-24 17:52       ` Jerry Snitselaar
  1 sibling, 1 reply; 22+ messages in thread
From: Jarkko Sakkinen @ 2020-11-24  3:26 UTC (permalink / raw)
  To: Jerry Snitselaar
  Cc: Matthew Garrett, linux-integrity, Linux Kernel Mailing List,
	Peter Huewe, Jason Gunthorpe, Hans de Goede, James Bottomley

On Wed, Nov 18, 2020 at 11:36:20PM -0700, Jerry Snitselaar wrote:
> 
> Matthew Garrett @ 2020-10-15 15:39 MST:
> 
> > On Thu, Oct 15, 2020 at 2:44 PM Jerry Snitselaar <jsnitsel@redhat.com> wrote:
> >>
> >> There is a misconfiguration in the bios of the gpio pin used for the
> >> interrupt in the T490s. When interrupts are enabled in the tpm_tis
> >> driver code this results in an interrupt storm. This was initially
> >> reported when we attempted to enable the interrupt code in the tpm_tis
> >> driver, which previously wasn't setting a flag to enable it. Due to
> >> the reports of the interrupt storm that code was reverted and we went back
> >> to polling instead of using interrupts. Now that we know the T490s problem
> >> is a firmware issue, add code to check if the system is a T490s and
> >> disable interrupts if that is the case. This will allow us to enable
> >> interrupts for everyone else. If the user has a fixed bios they can
> >> force the enabling of interrupts with tpm_tis.interrupts=1 on the
> >> kernel command line.
> >
> > I think an implication of this is that systems haven't been
> > well-tested with interrupts enabled. In general when we've found a
> > firmware issue in one place it ends up happening elsewhere as well, so
> > it wouldn't surprise me if there are other machines that will also be
> > unhappy with interrupts enabled. Would it be possible to automatically
> > detect this case (eg, if we get more than a certain number of
> > interrupts in a certain timeframe immediately after enabling the
> > interrupt) and automatically fall back to polling in that case? It
> > would also mean that users with fixed firmware wouldn't need to pass a
> > parameter.
> 
> I believe Matthew is correct here. I found another system today
> with completely different vendor for both the system and the tpm chip.
> In addition another Lenovo model, the L490, has the issue.
> 
> This initial attempt at a solution like Matthew suggested works on
> the system I found today, but I imagine it is all sorts of wrong.
> In the 2 systems where I've seen it, there are about 100000 interrupts
> in around 1.5 seconds, and then the irq code shuts down the interrupt
> because they aren't being handled.
> 
> 
> diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
> index 49ae09ac604f..478e9d02a3fa 100644
> --- a/drivers/char/tpm/tpm_tis_core.c
> +++ b/drivers/char/tpm/tpm_tis_core.c
> @@ -27,6 +27,11 @@
>  #include "tpm.h"
>  #include "tpm_tis_core.h"
> 
> +static unsigned int time_start = 0;
> +static bool storm_check = true;
> +static bool storm_killed = false;
> +static u32 irqs_fired = 0;

Maybe kstat_irqs() would be a better idea than ad hoc stats.

> +
>  static void tpm_tis_clkrun_enable(struct tpm_chip *chip, bool value);
> 
>  static void tpm_tis_enable_interrupt(struct tpm_chip *chip, u8 mask)
> @@ -464,25 +469,31 @@ static int tpm_tis_send_data(struct tpm_chip *chip, const u8 *buf, size_t len)
>         return rc;
>  }
> 
> -static void disable_interrupts(struct tpm_chip *chip)
> +static void __disable_interrupts(struct tpm_chip *chip)
>  {
>         struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev);
>         u32 intmask;
>         int rc;
> 
> -       if (priv->irq == 0)
> -               return;
> -
>         rc = tpm_tis_read32(priv, TPM_INT_ENABLE(priv->locality), &intmask);
>         if (rc < 0)
>                 intmask = 0;
> 
>         intmask &= ~TPM_GLOBAL_INT_ENABLE;
>         rc = tpm_tis_write32(priv, TPM_INT_ENABLE(priv->locality), intmask);
> +       chip->flags &= ~TPM_CHIP_FLAG_IRQ;
> +}
> +
> +static void disable_interrupts(struct tpm_chip *chip)
> +{
> +       struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev);
> 
> +       if (priv->irq == 0)
> +               return;
> +
> +       __disable_interrupts(chip);
>         devm_free_irq(chip->dev.parent, priv->irq, chip);
>         priv->irq = 0;
> -       chip->flags &= ~TPM_CHIP_FLAG_IRQ;
>  }
> 
>  /*
> @@ -528,6 +539,12 @@ static int tpm_tis_send(struct tpm_chip *chip, u8 *buf, size_t len)
>         int rc, irq;
>         struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev);
> 
> +       if (unlikely(storm_killed)) {
> +               devm_free_irq(chip->dev.parent, priv->irq, chip);
> +               priv->irq = 0;
> +               storm_killed = false;
> +       }

OK this kind of bad solution because if tpm_tis_send() is not called,
then IRQ is never freed. AFAIK, devres_* do not sleep but use spin
lock, i.e. you could render out both storm_check and storm_killed.

> +
>         if (!(chip->flags & TPM_CHIP_FLAG_IRQ) || priv->irq_tested)
>                 return tpm_tis_send_main(chip, buf, len);
> 
> @@ -748,6 +765,21 @@ static irqreturn_t tis_int_handler(int dummy, void *dev_id)
>         u32 interrupt;
>         int i, rc;
> 
> +       if (storm_check) {
> +               irqs_fired++;
> +
> +               if (!time_start) {
> +                       time_start = jiffies_to_msecs(jiffies);
> +               } else if ((irqs_fired > 1000) && (jiffies_to_msecs(jiffies) - jiffies < 500)) {
> +                       __disable_interrupts(chip);
> +                       storm_check = false;
> +                       storm_killed = true;
> +                       return IRQ_HANDLED;
> +               } else if ((jiffies_to_msecs(jiffies) - time_start > 500) && (irqs_fired < 1000)) {
> +                       storm_check = false;
> +               }
> +       }
> +
>         rc = tpm_tis_read32(priv, TPM_INT_STATUS(priv->locality), &interrupt);
>         if (rc < 0)
>                 return IRQ_NONE;
> 
> 

/Jarkko

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] tpm_tis: Disable interrupts on ThinkPad T490s
  2020-11-19 14:42     ` Hans de Goede
  2020-11-19 17:05       ` Jerry Snitselaar
@ 2020-11-24  3:27       ` Jarkko Sakkinen
  2020-11-24  3:29         ` Jarkko Sakkinen
  1 sibling, 1 reply; 22+ messages in thread
From: Jarkko Sakkinen @ 2020-11-24  3:27 UTC (permalink / raw)
  To: Hans de Goede
  Cc: Jerry Snitselaar, Matthew Garrett, linux-integrity,
	Linux Kernel Mailing List, Peter Huewe, Jason Gunthorpe,
	James Bottomley

On Thu, Nov 19, 2020 at 03:42:35PM +0100, Hans de Goede wrote:
> Hi,
> 
> On 11/19/20 7:36 AM, Jerry Snitselaar wrote:
> > 
> > Matthew Garrett @ 2020-10-15 15:39 MST:
> > 
> >> On Thu, Oct 15, 2020 at 2:44 PM Jerry Snitselaar <jsnitsel@redhat.com> wrote:
> >>>
> >>> There is a misconfiguration in the bios of the gpio pin used for the
> >>> interrupt in the T490s. When interrupts are enabled in the tpm_tis
> >>> driver code this results in an interrupt storm. This was initially
> >>> reported when we attempted to enable the interrupt code in the tpm_tis
> >>> driver, which previously wasn't setting a flag to enable it. Due to
> >>> the reports of the interrupt storm that code was reverted and we went back
> >>> to polling instead of using interrupts. Now that we know the T490s problem
> >>> is a firmware issue, add code to check if the system is a T490s and
> >>> disable interrupts if that is the case. This will allow us to enable
> >>> interrupts for everyone else. If the user has a fixed bios they can
> >>> force the enabling of interrupts with tpm_tis.interrupts=1 on the
> >>> kernel command line.
> >>
> >> I think an implication of this is that systems haven't been
> >> well-tested with interrupts enabled. In general when we've found a
> >> firmware issue in one place it ends up happening elsewhere as well, so
> >> it wouldn't surprise me if there are other machines that will also be
> >> unhappy with interrupts enabled. Would it be possible to automatically
> >> detect this case (eg, if we get more than a certain number of
> >> interrupts in a certain timeframe immediately after enabling the
> >> interrupt) and automatically fall back to polling in that case? It
> >> would also mean that users with fixed firmware wouldn't need to pass a
> >> parameter.
> > 
> > I believe Matthew is correct here. I found another system today
> > with completely different vendor for both the system and the tpm chip.
> > In addition another Lenovo model, the L490, has the issue.
> > 
> > This initial attempt at a solution like Matthew suggested works on
> > the system I found today, but I imagine it is all sorts of wrong.
> > In the 2 systems where I've seen it, there are about 100000 interrupts
> > in around 1.5 seconds, and then the irq code shuts down the interrupt
> > because they aren't being handled.
> 
> Is that with your patch? The IRQ should be silenced as soon as
> devm_free_irq(chip->dev.parent, priv->irq, chip); is called.
> 
> Depending on if we can get your storm-detection to work or not,
> we might also choose to just never try to use the IRQ (at least on
> x86 systems). AFAIK the TPM is never used for high-throughput stuff
> so the polling overhead should not be a big deal (and I'm getting the feeling
> that Windows always polls).
> 
> Regards,
> 
> Hans

Yeah, this is what I've been wondering for a while. Why could not we
just strip off IRQ code? Why does it matter?

/Jarkko

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] tpm_tis: Disable interrupts on ThinkPad T490s
  2020-11-24  3:27       ` Jarkko Sakkinen
@ 2020-11-24  3:29         ` Jarkko Sakkinen
  0 siblings, 0 replies; 22+ messages in thread
From: Jarkko Sakkinen @ 2020-11-24  3:29 UTC (permalink / raw)
  To: Hans de Goede
  Cc: Jerry Snitselaar, Matthew Garrett, linux-integrity,
	Linux Kernel Mailing List, Peter Huewe, Jason Gunthorpe,
	James Bottomley

On Tue, Nov 24, 2020 at 05:27:30AM +0200, Jarkko Sakkinen wrote:
> On Thu, Nov 19, 2020 at 03:42:35PM +0100, Hans de Goede wrote:
> > Hi,
> > 
> > On 11/19/20 7:36 AM, Jerry Snitselaar wrote:
> > > 
> > > Matthew Garrett @ 2020-10-15 15:39 MST:
> > > 
> > >> On Thu, Oct 15, 2020 at 2:44 PM Jerry Snitselaar <jsnitsel@redhat.com> wrote:
> > >>>
> > >>> There is a misconfiguration in the bios of the gpio pin used for the
> > >>> interrupt in the T490s. When interrupts are enabled in the tpm_tis
> > >>> driver code this results in an interrupt storm. This was initially
> > >>> reported when we attempted to enable the interrupt code in the tpm_tis
> > >>> driver, which previously wasn't setting a flag to enable it. Due to
> > >>> the reports of the interrupt storm that code was reverted and we went back
> > >>> to polling instead of using interrupts. Now that we know the T490s problem
> > >>> is a firmware issue, add code to check if the system is a T490s and
> > >>> disable interrupts if that is the case. This will allow us to enable
> > >>> interrupts for everyone else. If the user has a fixed bios they can
> > >>> force the enabling of interrupts with tpm_tis.interrupts=1 on the
> > >>> kernel command line.
> > >>
> > >> I think an implication of this is that systems haven't been
> > >> well-tested with interrupts enabled. In general when we've found a
> > >> firmware issue in one place it ends up happening elsewhere as well, so
> > >> it wouldn't surprise me if there are other machines that will also be
> > >> unhappy with interrupts enabled. Would it be possible to automatically
> > >> detect this case (eg, if we get more than a certain number of
> > >> interrupts in a certain timeframe immediately after enabling the
> > >> interrupt) and automatically fall back to polling in that case? It
> > >> would also mean that users with fixed firmware wouldn't need to pass a
> > >> parameter.
> > > 
> > > I believe Matthew is correct here. I found another system today
> > > with completely different vendor for both the system and the tpm chip.
> > > In addition another Lenovo model, the L490, has the issue.
> > > 
> > > This initial attempt at a solution like Matthew suggested works on
> > > the system I found today, but I imagine it is all sorts of wrong.
> > > In the 2 systems where I've seen it, there are about 100000 interrupts
> > > in around 1.5 seconds, and then the irq code shuts down the interrupt
> > > because they aren't being handled.
> > 
> > Is that with your patch? The IRQ should be silenced as soon as
> > devm_free_irq(chip->dev.parent, priv->irq, chip); is called.
> > 
> > Depending on if we can get your storm-detection to work or not,
> > we might also choose to just never try to use the IRQ (at least on
> > x86 systems). AFAIK the TPM is never used for high-throughput stuff
> > so the polling overhead should not be a big deal (and I'm getting the feeling
> > that Windows always polls).
> > 
> > Regards,
> > 
> > Hans
> 
> Yeah, this is what I've been wondering for a while. Why could not we
> just strip off IRQ code? Why does it matter?

And we DO NOT use interrupts in tpm_crb and nobody has ever complained.

/Jarkko

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] tpm_tis: Disable interrupts on ThinkPad T490s
  2020-11-24  3:26     ` Jarkko Sakkinen
@ 2020-11-24 17:52       ` Jerry Snitselaar
  2020-11-24 18:10         ` James Bottomley
                           ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Jerry Snitselaar @ 2020-11-24 17:52 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: Matthew Garrett, linux-integrity, Linux Kernel Mailing List,
	Peter Huewe, Jason Gunthorpe, Hans de Goede, James Bottomley


Jarkko Sakkinen @ 2020-11-23 20:26 MST:

> On Wed, Nov 18, 2020 at 11:36:20PM -0700, Jerry Snitselaar wrote:
>> 
>> Matthew Garrett @ 2020-10-15 15:39 MST:
>> 
>> > On Thu, Oct 15, 2020 at 2:44 PM Jerry Snitselaar <jsnitsel@redhat.com> wrote:
>> >>
>> >> There is a misconfiguration in the bios of the gpio pin used for the
>> >> interrupt in the T490s. When interrupts are enabled in the tpm_tis
>> >> driver code this results in an interrupt storm. This was initially
>> >> reported when we attempted to enable the interrupt code in the tpm_tis
>> >> driver, which previously wasn't setting a flag to enable it. Due to
>> >> the reports of the interrupt storm that code was reverted and we went back
>> >> to polling instead of using interrupts. Now that we know the T490s problem
>> >> is a firmware issue, add code to check if the system is a T490s and
>> >> disable interrupts if that is the case. This will allow us to enable
>> >> interrupts for everyone else. If the user has a fixed bios they can
>> >> force the enabling of interrupts with tpm_tis.interrupts=1 on the
>> >> kernel command line.
>> >
>> > I think an implication of this is that systems haven't been
>> > well-tested with interrupts enabled. In general when we've found a
>> > firmware issue in one place it ends up happening elsewhere as well, so
>> > it wouldn't surprise me if there are other machines that will also be
>> > unhappy with interrupts enabled. Would it be possible to automatically
>> > detect this case (eg, if we get more than a certain number of
>> > interrupts in a certain timeframe immediately after enabling the
>> > interrupt) and automatically fall back to polling in that case? It
>> > would also mean that users with fixed firmware wouldn't need to pass a
>> > parameter.
>> 
>> I believe Matthew is correct here. I found another system today
>> with completely different vendor for both the system and the tpm chip.
>> In addition another Lenovo model, the L490, has the issue.
>> 
>> This initial attempt at a solution like Matthew suggested works on
>> the system I found today, but I imagine it is all sorts of wrong.
>> In the 2 systems where I've seen it, there are about 100000 interrupts
>> in around 1.5 seconds, and then the irq code shuts down the interrupt
>> because they aren't being handled.
>> 
>> 
>> diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
>> index 49ae09ac604f..478e9d02a3fa 100644
>> --- a/drivers/char/tpm/tpm_tis_core.c
>> +++ b/drivers/char/tpm/tpm_tis_core.c
>> @@ -27,6 +27,11 @@
>>  #include "tpm.h"
>>  #include "tpm_tis_core.h"
>> 
>> +static unsigned int time_start = 0;
>> +static bool storm_check = true;
>> +static bool storm_killed = false;
>> +static u32 irqs_fired = 0;
>
> Maybe kstat_irqs() would be a better idea than ad hoc stats.
>

Thanks, yes that would be better.

>> +
>>  static void tpm_tis_clkrun_enable(struct tpm_chip *chip, bool value);
>> 
>>  static void tpm_tis_enable_interrupt(struct tpm_chip *chip, u8 mask)
>> @@ -464,25 +469,31 @@ static int tpm_tis_send_data(struct tpm_chip *chip, const u8 *buf, size_t len)
>>         return rc;
>>  }
>> 
>> -static void disable_interrupts(struct tpm_chip *chip)
>> +static void __disable_interrupts(struct tpm_chip *chip)
>>  {
>>         struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev);
>>         u32 intmask;
>>         int rc;
>> 
>> -       if (priv->irq == 0)
>> -               return;
>> -
>>         rc = tpm_tis_read32(priv, TPM_INT_ENABLE(priv->locality), &intmask);
>>         if (rc < 0)
>>                 intmask = 0;
>> 
>>         intmask &= ~TPM_GLOBAL_INT_ENABLE;
>>         rc = tpm_tis_write32(priv, TPM_INT_ENABLE(priv->locality), intmask);
>> +       chip->flags &= ~TPM_CHIP_FLAG_IRQ;
>> +}
>> +
>> +static void disable_interrupts(struct tpm_chip *chip)
>> +{
>> +       struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev);
>> 
>> +       if (priv->irq == 0)
>> +               return;
>> +
>> +       __disable_interrupts(chip);
>>         devm_free_irq(chip->dev.parent, priv->irq, chip);
>>         priv->irq = 0;
>> -       chip->flags &= ~TPM_CHIP_FLAG_IRQ;
>>  }
>> 
>>  /*
>> @@ -528,6 +539,12 @@ static int tpm_tis_send(struct tpm_chip *chip, u8 *buf, size_t len)
>>         int rc, irq;
>>         struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev);
>> 
>> +       if (unlikely(storm_killed)) {
>> +               devm_free_irq(chip->dev.parent, priv->irq, chip);
>> +               priv->irq = 0;
>> +               storm_killed = false;
>> +       }
>
> OK this kind of bad solution because if tpm_tis_send() is not called,
> then IRQ is never freed. AFAIK, devres_* do not sleep but use spin
> lock, i.e. you could render out both storm_check and storm_killed.
>

Is there a way to flag it for freeing later while in an interrupt
context? I'm not sure where to clean it up since devm_free_irq can't be
called in tis_int_handler.

Before diving further into that though, does anyone else have an opinion
on ripping out the irq code, and just using polling? We've been only
polling since 2015 anyways.

>> +
>>         if (!(chip->flags & TPM_CHIP_FLAG_IRQ) || priv->irq_tested)
>>                 return tpm_tis_send_main(chip, buf, len);
>> 
>> @@ -748,6 +765,21 @@ static irqreturn_t tis_int_handler(int dummy, void *dev_id)
>>         u32 interrupt;
>>         int i, rc;
>> 
>> +       if (storm_check) {
>> +               irqs_fired++;
>> +
>> +               if (!time_start) {
>> +                       time_start = jiffies_to_msecs(jiffies);
>> +               } else if ((irqs_fired > 1000) && (jiffies_to_msecs(jiffies) - jiffies < 500)) {
>> +                       __disable_interrupts(chip);
>> +                       storm_check = false;
>> +                       storm_killed = true;
>> +                       return IRQ_HANDLED;
>> +               } else if ((jiffies_to_msecs(jiffies) - time_start > 500) && (irqs_fired < 1000)) {
>> +                       storm_check = false;
>> +               }
>> +       }
>> +
>>         rc = tpm_tis_read32(priv, TPM_INT_STATUS(priv->locality), &interrupt);
>>         if (rc < 0)
>>                 return IRQ_NONE;
>> 
>> 
>
> /Jarkko


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] tpm_tis: Disable interrupts on ThinkPad T490s
  2020-11-24 17:52       ` Jerry Snitselaar
@ 2020-11-24 18:10         ` James Bottomley
  2020-11-29  3:21           ` Jarkko Sakkinen
  2020-11-24 21:45         ` Hans de Goede
  2020-11-29  3:18         ` Jarkko Sakkinen
  2 siblings, 1 reply; 22+ messages in thread
From: James Bottomley @ 2020-11-24 18:10 UTC (permalink / raw)
  To: Jerry Snitselaar, Jarkko Sakkinen
  Cc: Matthew Garrett, linux-integrity, Linux Kernel Mailing List,
	Peter Huewe, Jason Gunthorpe, Hans de Goede

On Tue, 2020-11-24 at 10:52 -0700, Jerry Snitselaar wrote:
> Before diving further into that though, does anyone else have an
> opinion on ripping out the irq code, and just using polling? We've
> been only polling since 2015 anyways.

Well only a biased one, obviously: polling causes large amounts of busy
waiting, which is a waste of CPU resources and does increase the time
it takes us to do TPM operations ... not a concern if you're doing long
computation ones, like signatures, but it is a problem for short
operations like bulk updates of PCRs.  The other potential issue, as we
saw with atmel is that if you prod the chip too often (which you have
to do with polling) you risk upsetting it.  We've spent ages trying to
tune the polling parameters to balance reduction of busy wait with chip
upset and still, apparently, not quite got it right.  If the TPM has a
functioning IRQ then it gets us out of the whole polling mess entirely.
The big question is how many chips that report an IRQ actually have a
malfunctioning one?

James




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] tpm_tis: Disable interrupts on ThinkPad T490s
  2020-11-24 17:52       ` Jerry Snitselaar
  2020-11-24 18:10         ` James Bottomley
@ 2020-11-24 21:45         ` Hans de Goede
  2020-11-29  3:23           ` Jarkko Sakkinen
  2020-11-29  3:18         ` Jarkko Sakkinen
  2 siblings, 1 reply; 22+ messages in thread
From: Hans de Goede @ 2020-11-24 21:45 UTC (permalink / raw)
  To: Jerry Snitselaar, Jarkko Sakkinen
  Cc: Matthew Garrett, linux-integrity, Linux Kernel Mailing List,
	Peter Huewe, Jason Gunthorpe, James Bottomley

Hi,

On 11/24/20 6:52 PM, Jerry Snitselaar wrote:
> 
> Jarkko Sakkinen @ 2020-11-23 20:26 MST:
> 
>> On Wed, Nov 18, 2020 at 11:36:20PM -0700, Jerry Snitselaar wrote:
>>>
>>> Matthew Garrett @ 2020-10-15 15:39 MST:
>>>
>>>> On Thu, Oct 15, 2020 at 2:44 PM Jerry Snitselaar <jsnitsel@redhat.com> wrote:
>>>>>
>>>>> There is a misconfiguration in the bios of the gpio pin used for the
>>>>> interrupt in the T490s. When interrupts are enabled in the tpm_tis
>>>>> driver code this results in an interrupt storm. This was initially
>>>>> reported when we attempted to enable the interrupt code in the tpm_tis
>>>>> driver, which previously wasn't setting a flag to enable it. Due to
>>>>> the reports of the interrupt storm that code was reverted and we went back
>>>>> to polling instead of using interrupts. Now that we know the T490s problem
>>>>> is a firmware issue, add code to check if the system is a T490s and
>>>>> disable interrupts if that is the case. This will allow us to enable
>>>>> interrupts for everyone else. If the user has a fixed bios they can
>>>>> force the enabling of interrupts with tpm_tis.interrupts=1 on the
>>>>> kernel command line.
>>>>
>>>> I think an implication of this is that systems haven't been
>>>> well-tested with interrupts enabled. In general when we've found a
>>>> firmware issue in one place it ends up happening elsewhere as well, so
>>>> it wouldn't surprise me if there are other machines that will also be
>>>> unhappy with interrupts enabled. Would it be possible to automatically
>>>> detect this case (eg, if we get more than a certain number of
>>>> interrupts in a certain timeframe immediately after enabling the
>>>> interrupt) and automatically fall back to polling in that case? It
>>>> would also mean that users with fixed firmware wouldn't need to pass a
>>>> parameter.
>>>
>>> I believe Matthew is correct here. I found another system today
>>> with completely different vendor for both the system and the tpm chip.
>>> In addition another Lenovo model, the L490, has the issue.
>>>
>>> This initial attempt at a solution like Matthew suggested works on
>>> the system I found today, but I imagine it is all sorts of wrong.
>>> In the 2 systems where I've seen it, there are about 100000 interrupts
>>> in around 1.5 seconds, and then the irq code shuts down the interrupt
>>> because they aren't being handled.
>>>
>>>
>>> diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
>>> index 49ae09ac604f..478e9d02a3fa 100644
>>> --- a/drivers/char/tpm/tpm_tis_core.c
>>> +++ b/drivers/char/tpm/tpm_tis_core.c
>>> @@ -27,6 +27,11 @@
>>>  #include "tpm.h"
>>>  #include "tpm_tis_core.h"
>>>
>>> +static unsigned int time_start = 0;
>>> +static bool storm_check = true;
>>> +static bool storm_killed = false;
>>> +static u32 irqs_fired = 0;
>>
>> Maybe kstat_irqs() would be a better idea than ad hoc stats.
>>
> 
> Thanks, yes that would be better.
> 
>>> +
>>>  static void tpm_tis_clkrun_enable(struct tpm_chip *chip, bool value);
>>>
>>>  static void tpm_tis_enable_interrupt(struct tpm_chip *chip, u8 mask)
>>> @@ -464,25 +469,31 @@ static int tpm_tis_send_data(struct tpm_chip *chip, const u8 *buf, size_t len)
>>>         return rc;
>>>  }
>>>
>>> -static void disable_interrupts(struct tpm_chip *chip)
>>> +static void __disable_interrupts(struct tpm_chip *chip)
>>>  {
>>>         struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev);
>>>         u32 intmask;
>>>         int rc;
>>>
>>> -       if (priv->irq == 0)
>>> -               return;
>>> -
>>>         rc = tpm_tis_read32(priv, TPM_INT_ENABLE(priv->locality), &intmask);
>>>         if (rc < 0)
>>>                 intmask = 0;
>>>
>>>         intmask &= ~TPM_GLOBAL_INT_ENABLE;
>>>         rc = tpm_tis_write32(priv, TPM_INT_ENABLE(priv->locality), intmask);
>>> +       chip->flags &= ~TPM_CHIP_FLAG_IRQ;
>>> +}
>>> +
>>> +static void disable_interrupts(struct tpm_chip *chip)
>>> +{
>>> +       struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev);
>>>
>>> +       if (priv->irq == 0)
>>> +               return;
>>> +
>>> +       __disable_interrupts(chip);
>>>         devm_free_irq(chip->dev.parent, priv->irq, chip);
>>>         priv->irq = 0;
>>> -       chip->flags &= ~TPM_CHIP_FLAG_IRQ;
>>>  }
>>>
>>>  /*
>>> @@ -528,6 +539,12 @@ static int tpm_tis_send(struct tpm_chip *chip, u8 *buf, size_t len)
>>>         int rc, irq;
>>>         struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev);
>>>
>>> +       if (unlikely(storm_killed)) {
>>> +               devm_free_irq(chip->dev.parent, priv->irq, chip);
>>> +               priv->irq = 0;
>>> +               storm_killed = false;
>>> +       }
>>
>> OK this kind of bad solution because if tpm_tis_send() is not called,
>> then IRQ is never freed. AFAIK, devres_* do not sleep but use spin
>> lock, i.e. you could render out both storm_check and storm_killed.
>>
> 
> Is there a way to flag it for freeing later while in an interrupt
> context? I'm not sure where to clean it up since devm_free_irq can't be
> called in tis_int_handler.

You could add a workqueue work-struct just for this and queue that up
to do the free when you detect the storm. That will then run pretty much
immediately, avoiding the storm going on for (much) longer.

> Before diving further into that though, does anyone else have an opinion
> on ripping out the irq code, and just using polling? We've been only
> polling since 2015 anyways.

Given James Bottomley's reply I guess it would be worthwhile to get the
storm detection to work.

Regards,

Hans


> 
>>> +
>>>         if (!(chip->flags & TPM_CHIP_FLAG_IRQ) || priv->irq_tested)
>>>                 return tpm_tis_send_main(chip, buf, len);
>>>
>>> @@ -748,6 +765,21 @@ static irqreturn_t tis_int_handler(int dummy, void *dev_id)
>>>         u32 interrupt;
>>>         int i, rc;
>>>
>>> +       if (storm_check) {
>>> +               irqs_fired++;
>>> +
>>> +               if (!time_start) {
>>> +                       time_start = jiffies_to_msecs(jiffies);
>>> +               } else if ((irqs_fired > 1000) && (jiffies_to_msecs(jiffies) - jiffies < 500)) {
>>> +                       __disable_interrupts(chip);
>>> +                       storm_check = false;
>>> +                       storm_killed = true;
>>> +                       return IRQ_HANDLED;
>>> +               } else if ((jiffies_to_msecs(jiffies) - time_start > 500) && (irqs_fired < 1000)) {
>>> +                       storm_check = false;
>>> +               }
>>> +       }
>>> +
>>>         rc = tpm_tis_read32(priv, TPM_INT_STATUS(priv->locality), &interrupt);
>>>         if (rc < 0)
>>>                 return IRQ_NONE;
>>>
>>>
>>
>> /Jarkko
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] tpm_tis: Disable interrupts on ThinkPad T490s
  2020-11-24 17:52       ` Jerry Snitselaar
  2020-11-24 18:10         ` James Bottomley
  2020-11-24 21:45         ` Hans de Goede
@ 2020-11-29  3:18         ` Jarkko Sakkinen
  2 siblings, 0 replies; 22+ messages in thread
From: Jarkko Sakkinen @ 2020-11-29  3:18 UTC (permalink / raw)
  To: Jerry Snitselaar
  Cc: Matthew Garrett, linux-integrity, Linux Kernel Mailing List,
	Peter Huewe, Jason Gunthorpe, Hans de Goede, James Bottomley

On Tue, Nov 24, 2020 at 10:52:56AM -0700, Jerry Snitselaar wrote:
> 
> Jarkko Sakkinen @ 2020-11-23 20:26 MST:
> 
> > On Wed, Nov 18, 2020 at 11:36:20PM -0700, Jerry Snitselaar wrote:
> >> 
> >> Matthew Garrett @ 2020-10-15 15:39 MST:
> >> 
> >> > On Thu, Oct 15, 2020 at 2:44 PM Jerry Snitselaar <jsnitsel@redhat.com> wrote:
> >> >>
> >> >> There is a misconfiguration in the bios of the gpio pin used for the
> >> >> interrupt in the T490s. When interrupts are enabled in the tpm_tis
> >> >> driver code this results in an interrupt storm. This was initially
> >> >> reported when we attempted to enable the interrupt code in the tpm_tis
> >> >> driver, which previously wasn't setting a flag to enable it. Due to
> >> >> the reports of the interrupt storm that code was reverted and we went back
> >> >> to polling instead of using interrupts. Now that we know the T490s problem
> >> >> is a firmware issue, add code to check if the system is a T490s and
> >> >> disable interrupts if that is the case. This will allow us to enable
> >> >> interrupts for everyone else. If the user has a fixed bios they can
> >> >> force the enabling of interrupts with tpm_tis.interrupts=1 on the
> >> >> kernel command line.
> >> >
> >> > I think an implication of this is that systems haven't been
> >> > well-tested with interrupts enabled. In general when we've found a
> >> > firmware issue in one place it ends up happening elsewhere as well, so
> >> > it wouldn't surprise me if there are other machines that will also be
> >> > unhappy with interrupts enabled. Would it be possible to automatically
> >> > detect this case (eg, if we get more than a certain number of
> >> > interrupts in a certain timeframe immediately after enabling the
> >> > interrupt) and automatically fall back to polling in that case? It
> >> > would also mean that users with fixed firmware wouldn't need to pass a
> >> > parameter.
> >> 
> >> I believe Matthew is correct here. I found another system today
> >> with completely different vendor for both the system and the tpm chip.
> >> In addition another Lenovo model, the L490, has the issue.
> >> 
> >> This initial attempt at a solution like Matthew suggested works on
> >> the system I found today, but I imagine it is all sorts of wrong.
> >> In the 2 systems where I've seen it, there are about 100000 interrupts
> >> in around 1.5 seconds, and then the irq code shuts down the interrupt
> >> because they aren't being handled.
> >> 
> >> 
> >> diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
> >> index 49ae09ac604f..478e9d02a3fa 100644
> >> --- a/drivers/char/tpm/tpm_tis_core.c
> >> +++ b/drivers/char/tpm/tpm_tis_core.c
> >> @@ -27,6 +27,11 @@
> >>  #include "tpm.h"
> >>  #include "tpm_tis_core.h"
> >> 
> >> +static unsigned int time_start = 0;
> >> +static bool storm_check = true;
> >> +static bool storm_killed = false;
> >> +static u32 irqs_fired = 0;
> >
> > Maybe kstat_irqs() would be a better idea than ad hoc stats.
> >
> 
> Thanks, yes that would be better.
> 
> >> +
> >>  static void tpm_tis_clkrun_enable(struct tpm_chip *chip, bool value);
> >> 
> >>  static void tpm_tis_enable_interrupt(struct tpm_chip *chip, u8 mask)
> >> @@ -464,25 +469,31 @@ static int tpm_tis_send_data(struct tpm_chip *chip, const u8 *buf, size_t len)
> >>         return rc;
> >>  }
> >> 
> >> -static void disable_interrupts(struct tpm_chip *chip)
> >> +static void __disable_interrupts(struct tpm_chip *chip)
> >>  {
> >>         struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev);
> >>         u32 intmask;
> >>         int rc;
> >> 
> >> -       if (priv->irq == 0)
> >> -               return;
> >> -
> >>         rc = tpm_tis_read32(priv, TPM_INT_ENABLE(priv->locality), &intmask);
> >>         if (rc < 0)
> >>                 intmask = 0;
> >> 
> >>         intmask &= ~TPM_GLOBAL_INT_ENABLE;
> >>         rc = tpm_tis_write32(priv, TPM_INT_ENABLE(priv->locality), intmask);
> >> +       chip->flags &= ~TPM_CHIP_FLAG_IRQ;
> >> +}
> >> +
> >> +static void disable_interrupts(struct tpm_chip *chip)
> >> +{
> >> +       struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev);
> >> 
> >> +       if (priv->irq == 0)
> >> +               return;
> >> +
> >> +       __disable_interrupts(chip);
> >>         devm_free_irq(chip->dev.parent, priv->irq, chip);
> >>         priv->irq = 0;
> >> -       chip->flags &= ~TPM_CHIP_FLAG_IRQ;
> >>  }
> >> 
> >>  /*
> >> @@ -528,6 +539,12 @@ static int tpm_tis_send(struct tpm_chip *chip, u8 *buf, size_t len)
> >>         int rc, irq;
> >>         struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev);
> >> 
> >> +       if (unlikely(storm_killed)) {
> >> +               devm_free_irq(chip->dev.parent, priv->irq, chip);
> >> +               priv->irq = 0;
> >> +               storm_killed = false;
> >> +       }
> >
> > OK this kind of bad solution because if tpm_tis_send() is not called,
> > then IRQ is never freed. AFAIK, devres_* do not sleep but use spin
> > lock, i.e. you could render out both storm_check and storm_killed.
> >
> 
> Is there a way to flag it for freeing later while in an interrupt
> context? I'm not sure where to clean it up since devm_free_irq can't be
> called in tis_int_handler.
> 
> Before diving further into that though, does anyone else have an opinion
> on ripping out the irq code, and just using polling? We've been only
> polling since 2015 anyways.

Given these all these issues, it's quite obvious that Windows side is
just polling. I'll ack a patch that removes the IRQ code for sure.

/Jarkko

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] tpm_tis: Disable interrupts on ThinkPad T490s
  2020-11-24 18:10         ` James Bottomley
@ 2020-11-29  3:21           ` Jarkko Sakkinen
  0 siblings, 0 replies; 22+ messages in thread
From: Jarkko Sakkinen @ 2020-11-29  3:21 UTC (permalink / raw)
  To: James Bottomley
  Cc: Jerry Snitselaar, Matthew Garrett, linux-integrity,
	Linux Kernel Mailing List, Peter Huewe, Jason Gunthorpe,
	Hans de Goede

On Tue, Nov 24, 2020 at 10:10:21AM -0800, James Bottomley wrote:
> On Tue, 2020-11-24 at 10:52 -0700, Jerry Snitselaar wrote:
> > Before diving further into that though, does anyone else have an
> > opinion on ripping out the irq code, and just using polling? We've
> > been only polling since 2015 anyways.
> 
> Well only a biased one, obviously: polling causes large amounts of busy
> waiting, which is a waste of CPU resources and does increase the time
> it takes us to do TPM operations ... not a concern if you're doing long
> computation ones, like signatures, but it is a problem for short
> operations like bulk updates of PCRs.  The other potential issue, as we
> saw with atmel is that if you prod the chip too often (which you have
> to do with polling) you risk upsetting it.  We've spent ages trying to
> tune the polling parameters to balance reduction of busy wait with chip
> upset and still, apparently, not quite got it right.  If the TPM has a
> functioning IRQ then it gets us out of the whole polling mess entirely.
> The big question is how many chips that report an IRQ actually have a
> malfunctioning one?
> 
> James

Do we have a way to know is Windows TPM code using IRQ's?

/Jarkko

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] tpm_tis: Disable interrupts on ThinkPad T490s
  2020-11-24 21:45         ` Hans de Goede
@ 2020-11-29  3:23           ` Jarkko Sakkinen
  2020-11-29 11:34             ` Hans de Goede
  0 siblings, 1 reply; 22+ messages in thread
From: Jarkko Sakkinen @ 2020-11-29  3:23 UTC (permalink / raw)
  To: Hans de Goede
  Cc: Jerry Snitselaar, Matthew Garrett, linux-integrity,
	Linux Kernel Mailing List, Peter Huewe, Jason Gunthorpe,
	James Bottomley

On Tue, Nov 24, 2020 at 10:45:01PM +0100, Hans de Goede wrote:
> Hi,
> 
> On 11/24/20 6:52 PM, Jerry Snitselaar wrote:
> > 
> > Jarkko Sakkinen @ 2020-11-23 20:26 MST:
> > 
> >> On Wed, Nov 18, 2020 at 11:36:20PM -0700, Jerry Snitselaar wrote:
> >>>
> >>> Matthew Garrett @ 2020-10-15 15:39 MST:
> >>>
> >>>> On Thu, Oct 15, 2020 at 2:44 PM Jerry Snitselaar <jsnitsel@redhat.com> wrote:
> >>>>>
> >>>>> There is a misconfiguration in the bios of the gpio pin used for the
> >>>>> interrupt in the T490s. When interrupts are enabled in the tpm_tis
> >>>>> driver code this results in an interrupt storm. This was initially
> >>>>> reported when we attempted to enable the interrupt code in the tpm_tis
> >>>>> driver, which previously wasn't setting a flag to enable it. Due to
> >>>>> the reports of the interrupt storm that code was reverted and we went back
> >>>>> to polling instead of using interrupts. Now that we know the T490s problem
> >>>>> is a firmware issue, add code to check if the system is a T490s and
> >>>>> disable interrupts if that is the case. This will allow us to enable
> >>>>> interrupts for everyone else. If the user has a fixed bios they can
> >>>>> force the enabling of interrupts with tpm_tis.interrupts=1 on the
> >>>>> kernel command line.
> >>>>
> >>>> I think an implication of this is that systems haven't been
> >>>> well-tested with interrupts enabled. In general when we've found a
> >>>> firmware issue in one place it ends up happening elsewhere as well, so
> >>>> it wouldn't surprise me if there are other machines that will also be
> >>>> unhappy with interrupts enabled. Would it be possible to automatically
> >>>> detect this case (eg, if we get more than a certain number of
> >>>> interrupts in a certain timeframe immediately after enabling the
> >>>> interrupt) and automatically fall back to polling in that case? It
> >>>> would also mean that users with fixed firmware wouldn't need to pass a
> >>>> parameter.
> >>>
> >>> I believe Matthew is correct here. I found another system today
> >>> with completely different vendor for both the system and the tpm chip.
> >>> In addition another Lenovo model, the L490, has the issue.
> >>>
> >>> This initial attempt at a solution like Matthew suggested works on
> >>> the system I found today, but I imagine it is all sorts of wrong.
> >>> In the 2 systems where I've seen it, there are about 100000 interrupts
> >>> in around 1.5 seconds, and then the irq code shuts down the interrupt
> >>> because they aren't being handled.
> >>>
> >>>
> >>> diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
> >>> index 49ae09ac604f..478e9d02a3fa 100644
> >>> --- a/drivers/char/tpm/tpm_tis_core.c
> >>> +++ b/drivers/char/tpm/tpm_tis_core.c
> >>> @@ -27,6 +27,11 @@
> >>>  #include "tpm.h"
> >>>  #include "tpm_tis_core.h"
> >>>
> >>> +static unsigned int time_start = 0;
> >>> +static bool storm_check = true;
> >>> +static bool storm_killed = false;
> >>> +static u32 irqs_fired = 0;
> >>
> >> Maybe kstat_irqs() would be a better idea than ad hoc stats.
> >>
> > 
> > Thanks, yes that would be better.
> > 
> >>> +
> >>>  static void tpm_tis_clkrun_enable(struct tpm_chip *chip, bool value);
> >>>
> >>>  static void tpm_tis_enable_interrupt(struct tpm_chip *chip, u8 mask)
> >>> @@ -464,25 +469,31 @@ static int tpm_tis_send_data(struct tpm_chip *chip, const u8 *buf, size_t len)
> >>>         return rc;
> >>>  }
> >>>
> >>> -static void disable_interrupts(struct tpm_chip *chip)
> >>> +static void __disable_interrupts(struct tpm_chip *chip)
> >>>  {
> >>>         struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev);
> >>>         u32 intmask;
> >>>         int rc;
> >>>
> >>> -       if (priv->irq == 0)
> >>> -               return;
> >>> -
> >>>         rc = tpm_tis_read32(priv, TPM_INT_ENABLE(priv->locality), &intmask);
> >>>         if (rc < 0)
> >>>                 intmask = 0;
> >>>
> >>>         intmask &= ~TPM_GLOBAL_INT_ENABLE;
> >>>         rc = tpm_tis_write32(priv, TPM_INT_ENABLE(priv->locality), intmask);
> >>> +       chip->flags &= ~TPM_CHIP_FLAG_IRQ;
> >>> +}
> >>> +
> >>> +static void disable_interrupts(struct tpm_chip *chip)
> >>> +{
> >>> +       struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev);
> >>>
> >>> +       if (priv->irq == 0)
> >>> +               return;
> >>> +
> >>> +       __disable_interrupts(chip);
> >>>         devm_free_irq(chip->dev.parent, priv->irq, chip);
> >>>         priv->irq = 0;
> >>> -       chip->flags &= ~TPM_CHIP_FLAG_IRQ;
> >>>  }
> >>>
> >>>  /*
> >>> @@ -528,6 +539,12 @@ static int tpm_tis_send(struct tpm_chip *chip, u8 *buf, size_t len)
> >>>         int rc, irq;
> >>>         struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev);
> >>>
> >>> +       if (unlikely(storm_killed)) {
> >>> +               devm_free_irq(chip->dev.parent, priv->irq, chip);
> >>> +               priv->irq = 0;
> >>> +               storm_killed = false;
> >>> +       }
> >>
> >> OK this kind of bad solution because if tpm_tis_send() is not called,
> >> then IRQ is never freed. AFAIK, devres_* do not sleep but use spin
> >> lock, i.e. you could render out both storm_check and storm_killed.
> >>
> > 
> > Is there a way to flag it for freeing later while in an interrupt
> > context? I'm not sure where to clean it up since devm_free_irq can't be
> > called in tis_int_handler.
> 
> You could add a workqueue work-struct just for this and queue that up
> to do the free when you detect the storm. That will then run pretty much
> immediately, avoiding the storm going on for (much) longer.

That's sounds feasible.

> > Before diving further into that though, does anyone else have an opinion
> > on ripping out the irq code, and just using polling? We've been only
> > polling since 2015 anyways.
> 
> Given James Bottomley's reply I guess it would be worthwhile to get the
> storm detection to work.

OK, agreed. I take my words back from a response few minutes ago :-)

> Regards,
> 
> Hans

/Jarkko

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] tpm_tis: Disable interrupts on ThinkPad T490s
  2020-11-29  3:23           ` Jarkko Sakkinen
@ 2020-11-29 11:34             ` Hans de Goede
  2020-12-02 16:07               ` Jarkko Sakkinen
  0 siblings, 1 reply; 22+ messages in thread
From: Hans de Goede @ 2020-11-29 11:34 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: Jerry Snitselaar, Matthew Garrett, linux-integrity,
	Linux Kernel Mailing List, Peter Huewe, Jason Gunthorpe,
	James Bottomley

Hi All,

On 11/29/20 4:23 AM, Jarkko Sakkinen wrote:
> On Tue, Nov 24, 2020 at 10:45:01PM +0100, Hans de Goede wrote:
>> Hi,
>>
>> On 11/24/20 6:52 PM, Jerry Snitselaar wrote:
>>>
>>> Jarkko Sakkinen @ 2020-11-23 20:26 MST:
>>>
>>>> On Wed, Nov 18, 2020 at 11:36:20PM -0700, Jerry Snitselaar wrote:
>>>>>
>>>>> Matthew Garrett @ 2020-10-15 15:39 MST:
>>>>>
>>>>>> On Thu, Oct 15, 2020 at 2:44 PM Jerry Snitselaar <jsnitsel@redhat.com> wrote:
>>>>>>>
>>>>>>> There is a misconfiguration in the bios of the gpio pin used for the
>>>>>>> interrupt in the T490s. When interrupts are enabled in the tpm_tis
>>>>>>> driver code this results in an interrupt storm. This was initially
>>>>>>> reported when we attempted to enable the interrupt code in the tpm_tis
>>>>>>> driver, which previously wasn't setting a flag to enable it. Due to
>>>>>>> the reports of the interrupt storm that code was reverted and we went back
>>>>>>> to polling instead of using interrupts. Now that we know the T490s problem
>>>>>>> is a firmware issue, add code to check if the system is a T490s and
>>>>>>> disable interrupts if that is the case. This will allow us to enable
>>>>>>> interrupts for everyone else. If the user has a fixed bios they can
>>>>>>> force the enabling of interrupts with tpm_tis.interrupts=1 on the
>>>>>>> kernel command line.
>>>>>>
>>>>>> I think an implication of this is that systems haven't been
>>>>>> well-tested with interrupts enabled. In general when we've found a
>>>>>> firmware issue in one place it ends up happening elsewhere as well, so
>>>>>> it wouldn't surprise me if there are other machines that will also be
>>>>>> unhappy with interrupts enabled. Would it be possible to automatically
>>>>>> detect this case (eg, if we get more than a certain number of
>>>>>> interrupts in a certain timeframe immediately after enabling the
>>>>>> interrupt) and automatically fall back to polling in that case? It
>>>>>> would also mean that users with fixed firmware wouldn't need to pass a
>>>>>> parameter.
>>>>>
>>>>> I believe Matthew is correct here. I found another system today
>>>>> with completely different vendor for both the system and the tpm chip.
>>>>> In addition another Lenovo model, the L490, has the issue.
>>>>>
>>>>> This initial attempt at a solution like Matthew suggested works on
>>>>> the system I found today, but I imagine it is all sorts of wrong.
>>>>> In the 2 systems where I've seen it, there are about 100000 interrupts
>>>>> in around 1.5 seconds, and then the irq code shuts down the interrupt
>>>>> because they aren't being handled.
>>>>>
>>>>>
>>>>> diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
>>>>> index 49ae09ac604f..478e9d02a3fa 100644
>>>>> --- a/drivers/char/tpm/tpm_tis_core.c
>>>>> +++ b/drivers/char/tpm/tpm_tis_core.c
>>>>> @@ -27,6 +27,11 @@
>>>>>  #include "tpm.h"
>>>>>  #include "tpm_tis_core.h"
>>>>>
>>>>> +static unsigned int time_start = 0;
>>>>> +static bool storm_check = true;
>>>>> +static bool storm_killed = false;
>>>>> +static u32 irqs_fired = 0;
>>>>
>>>> Maybe kstat_irqs() would be a better idea than ad hoc stats.
>>>>
>>>
>>> Thanks, yes that would be better.
>>>
>>>>> +
>>>>>  static void tpm_tis_clkrun_enable(struct tpm_chip *chip, bool value);
>>>>>
>>>>>  static void tpm_tis_enable_interrupt(struct tpm_chip *chip, u8 mask)
>>>>> @@ -464,25 +469,31 @@ static int tpm_tis_send_data(struct tpm_chip *chip, const u8 *buf, size_t len)
>>>>>         return rc;
>>>>>  }
>>>>>
>>>>> -static void disable_interrupts(struct tpm_chip *chip)
>>>>> +static void __disable_interrupts(struct tpm_chip *chip)
>>>>>  {
>>>>>         struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev);
>>>>>         u32 intmask;
>>>>>         int rc;
>>>>>
>>>>> -       if (priv->irq == 0)
>>>>> -               return;
>>>>> -
>>>>>         rc = tpm_tis_read32(priv, TPM_INT_ENABLE(priv->locality), &intmask);
>>>>>         if (rc < 0)
>>>>>                 intmask = 0;
>>>>>
>>>>>         intmask &= ~TPM_GLOBAL_INT_ENABLE;
>>>>>         rc = tpm_tis_write32(priv, TPM_INT_ENABLE(priv->locality), intmask);
>>>>> +       chip->flags &= ~TPM_CHIP_FLAG_IRQ;
>>>>> +}
>>>>> +
>>>>> +static void disable_interrupts(struct tpm_chip *chip)
>>>>> +{
>>>>> +       struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev);
>>>>>
>>>>> +       if (priv->irq == 0)
>>>>> +               return;
>>>>> +
>>>>> +       __disable_interrupts(chip);
>>>>>         devm_free_irq(chip->dev.parent, priv->irq, chip);
>>>>>         priv->irq = 0;
>>>>> -       chip->flags &= ~TPM_CHIP_FLAG_IRQ;
>>>>>  }
>>>>>
>>>>>  /*
>>>>> @@ -528,6 +539,12 @@ static int tpm_tis_send(struct tpm_chip *chip, u8 *buf, size_t len)
>>>>>         int rc, irq;
>>>>>         struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev);
>>>>>
>>>>> +       if (unlikely(storm_killed)) {
>>>>> +               devm_free_irq(chip->dev.parent, priv->irq, chip);
>>>>> +               priv->irq = 0;
>>>>> +               storm_killed = false;
>>>>> +       }
>>>>
>>>> OK this kind of bad solution because if tpm_tis_send() is not called,
>>>> then IRQ is never freed. AFAIK, devres_* do not sleep but use spin
>>>> lock, i.e. you could render out both storm_check and storm_killed.
>>>>
>>>
>>> Is there a way to flag it for freeing later while in an interrupt
>>> context? I'm not sure where to clean it up since devm_free_irq can't be
>>> called in tis_int_handler.
>>
>> You could add a workqueue work-struct just for this and queue that up
>> to do the free when you detect the storm. That will then run pretty much
>> immediately, avoiding the storm going on for (much) longer.
> 
> That's sounds feasible.
> 
>>> Before diving further into that though, does anyone else have an opinion
>>> on ripping out the irq code, and just using polling? We've been only
>>> polling since 2015 anyways.
>>
>> Given James Bottomley's reply I guess it would be worthwhile to get the
>> storm detection to work.
> 
> OK, agreed. I take my words back from a response few minutes ago :-)

:)

To be clear, I think we should give the storm detection a go. Especially
given the problems which James has seen with polling on some TPMs.

But if that turns out to not be feasible I agree we should just either
disable IRQs by default on standard x86 platforms, or just remove the
IRQ support all together.

Regards,

Hans


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] tpm_tis: Disable interrupts on ThinkPad T490s
  2020-11-29 11:34             ` Hans de Goede
@ 2020-12-02 16:07               ` Jarkko Sakkinen
  0 siblings, 0 replies; 22+ messages in thread
From: Jarkko Sakkinen @ 2020-12-02 16:07 UTC (permalink / raw)
  To: Hans de Goede
  Cc: Jerry Snitselaar, Matthew Garrett, linux-integrity,
	Linux Kernel Mailing List, Peter Huewe, Jason Gunthorpe,
	James Bottomley

On Sun, Nov 29, 2020 at 12:34:34PM +0100, Hans de Goede wrote:
> Hi All,
> 
> On 11/29/20 4:23 AM, Jarkko Sakkinen wrote:
> > On Tue, Nov 24, 2020 at 10:45:01PM +0100, Hans de Goede wrote:
> >> Hi,
> >>
> >> On 11/24/20 6:52 PM, Jerry Snitselaar wrote:
> >>>
> >>> Jarkko Sakkinen @ 2020-11-23 20:26 MST:
> >>>
> >>>> On Wed, Nov 18, 2020 at 11:36:20PM -0700, Jerry Snitselaar wrote:
> >>>>>
> >>>>> Matthew Garrett @ 2020-10-15 15:39 MST:
> >>>>>
> >>>>>> On Thu, Oct 15, 2020 at 2:44 PM Jerry Snitselaar <jsnitsel@redhat.com> wrote:
> >>>>>>>
> >>>>>>> There is a misconfiguration in the bios of the gpio pin used for the
> >>>>>>> interrupt in the T490s. When interrupts are enabled in the tpm_tis
> >>>>>>> driver code this results in an interrupt storm. This was initially
> >>>>>>> reported when we attempted to enable the interrupt code in the tpm_tis
> >>>>>>> driver, which previously wasn't setting a flag to enable it. Due to
> >>>>>>> the reports of the interrupt storm that code was reverted and we went back
> >>>>>>> to polling instead of using interrupts. Now that we know the T490s problem
> >>>>>>> is a firmware issue, add code to check if the system is a T490s and
> >>>>>>> disable interrupts if that is the case. This will allow us to enable
> >>>>>>> interrupts for everyone else. If the user has a fixed bios they can
> >>>>>>> force the enabling of interrupts with tpm_tis.interrupts=1 on the
> >>>>>>> kernel command line.
> >>>>>>
> >>>>>> I think an implication of this is that systems haven't been
> >>>>>> well-tested with interrupts enabled. In general when we've found a
> >>>>>> firmware issue in one place it ends up happening elsewhere as well, so
> >>>>>> it wouldn't surprise me if there are other machines that will also be
> >>>>>> unhappy with interrupts enabled. Would it be possible to automatically
> >>>>>> detect this case (eg, if we get more than a certain number of
> >>>>>> interrupts in a certain timeframe immediately after enabling the
> >>>>>> interrupt) and automatically fall back to polling in that case? It
> >>>>>> would also mean that users with fixed firmware wouldn't need to pass a
> >>>>>> parameter.
> >>>>>
> >>>>> I believe Matthew is correct here. I found another system today
> >>>>> with completely different vendor for both the system and the tpm chip.
> >>>>> In addition another Lenovo model, the L490, has the issue.
> >>>>>
> >>>>> This initial attempt at a solution like Matthew suggested works on
> >>>>> the system I found today, but I imagine it is all sorts of wrong.
> >>>>> In the 2 systems where I've seen it, there are about 100000 interrupts
> >>>>> in around 1.5 seconds, and then the irq code shuts down the interrupt
> >>>>> because they aren't being handled.
> >>>>>
> >>>>>
> >>>>> diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
> >>>>> index 49ae09ac604f..478e9d02a3fa 100644
> >>>>> --- a/drivers/char/tpm/tpm_tis_core.c
> >>>>> +++ b/drivers/char/tpm/tpm_tis_core.c
> >>>>> @@ -27,6 +27,11 @@
> >>>>>  #include "tpm.h"
> >>>>>  #include "tpm_tis_core.h"
> >>>>>
> >>>>> +static unsigned int time_start = 0;
> >>>>> +static bool storm_check = true;
> >>>>> +static bool storm_killed = false;
> >>>>> +static u32 irqs_fired = 0;
> >>>>
> >>>> Maybe kstat_irqs() would be a better idea than ad hoc stats.
> >>>>
> >>>
> >>> Thanks, yes that would be better.
> >>>
> >>>>> +
> >>>>>  static void tpm_tis_clkrun_enable(struct tpm_chip *chip, bool value);
> >>>>>
> >>>>>  static void tpm_tis_enable_interrupt(struct tpm_chip *chip, u8 mask)
> >>>>> @@ -464,25 +469,31 @@ static int tpm_tis_send_data(struct tpm_chip *chip, const u8 *buf, size_t len)
> >>>>>         return rc;
> >>>>>  }
> >>>>>
> >>>>> -static void disable_interrupts(struct tpm_chip *chip)
> >>>>> +static void __disable_interrupts(struct tpm_chip *chip)
> >>>>>  {
> >>>>>         struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev);
> >>>>>         u32 intmask;
> >>>>>         int rc;
> >>>>>
> >>>>> -       if (priv->irq == 0)
> >>>>> -               return;
> >>>>> -
> >>>>>         rc = tpm_tis_read32(priv, TPM_INT_ENABLE(priv->locality), &intmask);
> >>>>>         if (rc < 0)
> >>>>>                 intmask = 0;
> >>>>>
> >>>>>         intmask &= ~TPM_GLOBAL_INT_ENABLE;
> >>>>>         rc = tpm_tis_write32(priv, TPM_INT_ENABLE(priv->locality), intmask);
> >>>>> +       chip->flags &= ~TPM_CHIP_FLAG_IRQ;
> >>>>> +}
> >>>>> +
> >>>>> +static void disable_interrupts(struct tpm_chip *chip)
> >>>>> +{
> >>>>> +       struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev);
> >>>>>
> >>>>> +       if (priv->irq == 0)
> >>>>> +               return;
> >>>>> +
> >>>>> +       __disable_interrupts(chip);
> >>>>>         devm_free_irq(chip->dev.parent, priv->irq, chip);
> >>>>>         priv->irq = 0;
> >>>>> -       chip->flags &= ~TPM_CHIP_FLAG_IRQ;
> >>>>>  }
> >>>>>
> >>>>>  /*
> >>>>> @@ -528,6 +539,12 @@ static int tpm_tis_send(struct tpm_chip *chip, u8 *buf, size_t len)
> >>>>>         int rc, irq;
> >>>>>         struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev);
> >>>>>
> >>>>> +       if (unlikely(storm_killed)) {
> >>>>> +               devm_free_irq(chip->dev.parent, priv->irq, chip);
> >>>>> +               priv->irq = 0;
> >>>>> +               storm_killed = false;
> >>>>> +       }
> >>>>
> >>>> OK this kind of bad solution because if tpm_tis_send() is not called,
> >>>> then IRQ is never freed. AFAIK, devres_* do not sleep but use spin
> >>>> lock, i.e. you could render out both storm_check and storm_killed.
> >>>>
> >>>
> >>> Is there a way to flag it for freeing later while in an interrupt
> >>> context? I'm not sure where to clean it up since devm_free_irq can't be
> >>> called in tis_int_handler.
> >>
> >> You could add a workqueue work-struct just for this and queue that up
> >> to do the free when you detect the storm. That will then run pretty much
> >> immediately, avoiding the storm going on for (much) longer.
> > 
> > That's sounds feasible.
> > 
> >>> Before diving further into that though, does anyone else have an opinion
> >>> on ripping out the irq code, and just using polling? We've been only
> >>> polling since 2015 anyways.
> >>
> >> Given James Bottomley's reply I guess it would be worthwhile to get the
> >> storm detection to work.
> > 
> > OK, agreed. I take my words back from a response few minutes ago :-)
> 
> :)
> 
> To be clear, I think we should give the storm detection a go. Especially
> given the problems which James has seen with polling on some TPMs.
> 
> But if that turns out to not be feasible I agree we should just either
> disable IRQs by default on standard x86 platforms, or just remove the
> IRQ support all together.

Just for completeness: one option is also to whitelist IRQ's.

> Regards,
> 
> Hans

/Jarkko

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2020-12-02 16:08 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-15 21:44 [PATCH] tpm_tis: Disable interrupts on ThinkPad T490s Jerry Snitselaar
2020-10-15 22:01 ` Jerry Snitselaar
2020-10-15 22:39 ` Matthew Garrett
2020-10-16  6:12   ` Hans de Goede
2020-11-19  6:36   ` Jerry Snitselaar
2020-11-19 14:42     ` Hans de Goede
2020-11-19 17:05       ` Jerry Snitselaar
2020-11-23 12:19         ` Hans de Goede
2020-11-24  3:27       ` Jarkko Sakkinen
2020-11-24  3:29         ` Jarkko Sakkinen
2020-11-24  3:26     ` Jarkko Sakkinen
2020-11-24 17:52       ` Jerry Snitselaar
2020-11-24 18:10         ` James Bottomley
2020-11-29  3:21           ` Jarkko Sakkinen
2020-11-24 21:45         ` Hans de Goede
2020-11-29  3:23           ` Jarkko Sakkinen
2020-11-29 11:34             ` Hans de Goede
2020-12-02 16:07               ` Jarkko Sakkinen
2020-11-29  3:18         ` Jarkko Sakkinen
2020-10-16  6:10 ` Hans de Goede
2020-10-18 21:11 ` Jarkko Sakkinen
2020-10-18 21:14   ` Jarkko Sakkinen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).