linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] irqchip/gic-v3-its: Add workaround for ThunderX2 erratum #174
@ 2018-01-18  5:28 Ganapatrao Kulkarni
  2018-01-19 12:23 ` Marc Zyngier
  2018-01-21  7:00 ` Jayachandran C
  0 siblings, 2 replies; 7+ messages in thread
From: Ganapatrao Kulkarni @ 2018-01-18  5:28 UTC (permalink / raw)
  To: linux-doc, linux-kernel, linux-arm-kernel
  Cc: marc.zyngier, tglx, jason, catalin.marinas, will.deacon, corbet,
	jnair, Robert.Richter, Jan.Glauber, Vadim.Lomovtsev, gklkml16

This erratum is observed on the ThunderX2 GICv3 ITS. When a
MOVI command is used to change affinity of a LPI to a collection/cpu
on another node, the LPI is not delivered to the cpu.
An additional INV command is required after the MOVI to deliver
the LPI to the new destination.

If we add INV after MOVI, there is a chance that we lose LPIs which
are raised when the affinity is changed. So for now, adding workaround fix
to disable inter node affinity change.

Signed-off-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com>
---

v2: Added workaround to avoid inter node affinity change.

v1: Initial patch

 Documentation/arm64/silicon-errata.txt |  1 +
 arch/arm64/Kconfig                     | 10 ++++++++++
 drivers/irqchip/irq-gic-v3-its.c       | 21 ++++++++++++++++++++-
 3 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/Documentation/arm64/silicon-errata.txt b/Documentation/arm64/silicon-errata.txt
index fc1c884..fb27cb5 100644
--- a/Documentation/arm64/silicon-errata.txt
+++ b/Documentation/arm64/silicon-errata.txt
@@ -63,6 +63,7 @@ stable kernels.
 | Cavium         | ThunderX Core   | #27456          | CAVIUM_ERRATUM_27456        |
 | Cavium         | ThunderX Core   | #30115          | CAVIUM_ERRATUM_30115        |
 | Cavium         | ThunderX SMMUv2 | #27704          | N/A                         |
+| Cavium         | ThunderX2 ITS   | #174            | CAVIUM_ERRATUM_174          |
 | Cavium         | ThunderX2 SMMUv3| #74             | N/A                         |
 | Cavium         | ThunderX2 SMMUv3| #126            | N/A                         |
 |                |                 |                 |                             |
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index c9a7e9e..0dbf3bd 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -461,6 +461,16 @@ config ARM64_ERRATUM_843419
 
 	  If unsure, say Y.
 
+config CAVIUM_ERRATUM_174
+	bool "Cavium ThunderX2 erratum 174"
+	default y
+	help
+	  Cavium ThunderX2 dual socket systems may loose interrupts
+	  on affinity change to a cpu on other node.
+	  This workaround fix avoids inter node affinity change.
+
+	  If unsure, say Y.
+
 config CAVIUM_ERRATUM_22375
 	bool "Cavium erratum 22375, 24313"
 	default y
diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index 06f025f..b0cb528 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -46,6 +46,7 @@
 #define ITS_FLAGS_CMDQ_NEEDS_FLUSHING		(1ULL << 0)
 #define ITS_FLAGS_WORKAROUND_CAVIUM_22375	(1ULL << 1)
 #define ITS_FLAGS_WORKAROUND_CAVIUM_23144	(1ULL << 2)
+#define ITS_FLAGS_WORKAROUND_CAVIUM_174		(1ULL << 3)
 
 #define RDIST_FLAGS_PROPBASE_NEEDS_FLUSHING	(1 << 0)
 
@@ -1102,7 +1103,8 @@ static int its_set_affinity(struct irq_data *d, const struct cpumask *mask_val,
 		return -EINVAL;
 
        /* lpi cannot be routed to a redistributor that is on a foreign node */
-	if (its_dev->its->flags & ITS_FLAGS_WORKAROUND_CAVIUM_23144) {
+	if (its_dev->its->flags & ITS_FLAGS_WORKAROUND_CAVIUM_23144 ||
+			its_dev->its->flags & ITS_FLAGS_WORKAROUND_CAVIUM_174) {
 		if (its_dev->its->numa_node >= 0) {
 			cpu_mask = cpumask_of_node(its_dev->its->numa_node);
 			if (!cpumask_intersects(mask_val, cpu_mask))
@@ -2904,6 +2906,15 @@ static int its_force_quiescent(void __iomem *base)
 	}
 }
 
+static bool __maybe_unused its_enable_quirk_cavium_174(void *data)
+{
+	struct its_node *its = data;
+
+	its->flags |= ITS_FLAGS_WORKAROUND_CAVIUM_174;
+
+	return true;
+}
+
 static bool __maybe_unused its_enable_quirk_cavium_22375(void *data)
 {
 	struct its_node *its = data;
@@ -3031,6 +3042,14 @@ static const struct gic_quirk its_quirks[] = {
 		.init	= its_enable_quirk_hip07_161600802,
 	},
 #endif
+#ifdef CONFIG_CAVIUM_ERRATUM_174
+	{
+		.desc	= "ITS: Cavium ThunderX2 erratum 174",
+		.iidr	= 0x13f,	/* ThunderX2 pass A1/A2/B0 */
+		.mask	= 0xffffffff,
+		.init	= its_enable_quirk_cavium_174,
+	},
+#endif
 	{
 	}
 };
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] irqchip/gic-v3-its: Add workaround for ThunderX2 erratum #174
  2018-01-18  5:28 [PATCH v2] irqchip/gic-v3-its: Add workaround for ThunderX2 erratum #174 Ganapatrao Kulkarni
@ 2018-01-19 12:23 ` Marc Zyngier
  2018-01-19 14:22   ` Ganapatrao Kulkarni
  2018-01-21  7:00 ` Jayachandran C
  1 sibling, 1 reply; 7+ messages in thread
From: Marc Zyngier @ 2018-01-19 12:23 UTC (permalink / raw)
  To: Ganapatrao Kulkarni, linux-doc, linux-kernel, linux-arm-kernel
  Cc: tglx, jason, catalin.marinas, will.deacon, corbet, jnair,
	Robert.Richter, Jan.Glauber, Vadim.Lomovtsev, gklkml16

On 18/01/18 05:28, Ganapatrao Kulkarni wrote:
> This erratum is observed on the ThunderX2 GICv3 ITS. When a
> MOVI command is used to change affinity of a LPI to a collection/cpu
> on another node, the LPI is not delivered to the cpu.
> An additional INV command is required after the MOVI to deliver
> the LPI to the new destination.
> 
> If we add INV after MOVI, there is a chance that we lose LPIs which
> are raised when the affinity is changed. So for now, adding workaround fix
> to disable inter node affinity change.
> 
> Signed-off-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com>
> ---
> 
> v2: Added workaround to avoid inter node affinity change.
> 
> v1: Initial patch
> 
>  Documentation/arm64/silicon-errata.txt |  1 +
>  arch/arm64/Kconfig                     | 10 ++++++++++
>  drivers/irqchip/irq-gic-v3-its.c       | 21 ++++++++++++++++++++-
>  3 files changed, 31 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/arm64/silicon-errata.txt b/Documentation/arm64/silicon-errata.txt
> index fc1c884..fb27cb5 100644
> --- a/Documentation/arm64/silicon-errata.txt
> +++ b/Documentation/arm64/silicon-errata.txt
> @@ -63,6 +63,7 @@ stable kernels.
>  | Cavium         | ThunderX Core   | #27456          | CAVIUM_ERRATUM_27456        |
>  | Cavium         | ThunderX Core   | #30115          | CAVIUM_ERRATUM_30115        |
>  | Cavium         | ThunderX SMMUv2 | #27704          | N/A                         |
> +| Cavium         | ThunderX2 ITS   | #174            | CAVIUM_ERRATUM_174          |
>  | Cavium         | ThunderX2 SMMUv3| #74             | N/A                         |
>  | Cavium         | ThunderX2 SMMUv3| #126            | N/A                         |
>  |                |                 |                 |                             |
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index c9a7e9e..0dbf3bd 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -461,6 +461,16 @@ config ARM64_ERRATUM_843419
>  
>  	  If unsure, say Y.
>  
> +config CAVIUM_ERRATUM_174
> +	bool "Cavium ThunderX2 erratum 174"
> +	default y
> +	help
> +	  Cavium ThunderX2 dual socket systems may loose interrupts
> +	  on affinity change to a cpu on other node.
> +	  This workaround fix avoids inter node affinity change.
> +
> +	  If unsure, say Y.
> +
>  config CAVIUM_ERRATUM_22375
>  	bool "Cavium erratum 22375, 24313"
>  	default y
> diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
> index 06f025f..b0cb528 100644
> --- a/drivers/irqchip/irq-gic-v3-its.c
> +++ b/drivers/irqchip/irq-gic-v3-its.c
> @@ -46,6 +46,7 @@
>  #define ITS_FLAGS_CMDQ_NEEDS_FLUSHING		(1ULL << 0)
>  #define ITS_FLAGS_WORKAROUND_CAVIUM_22375	(1ULL << 1)
>  #define ITS_FLAGS_WORKAROUND_CAVIUM_23144	(1ULL << 2)
> +#define ITS_FLAGS_WORKAROUND_CAVIUM_174		(1ULL << 3)

Instead of inventing a new flag, please rename the existing one to
ITS_FLAG_WORKAROUND_RESTRICT_NODE_AFFINITY (or something similar). There
is really no need to have two flags that do the exact same thing,

>  
>  #define RDIST_FLAGS_PROPBASE_NEEDS_FLUSHING	(1 << 0)
>  
> @@ -1102,7 +1103,8 @@ static int its_set_affinity(struct irq_data *d, const struct cpumask *mask_val,
>  		return -EINVAL;
>  
>         /* lpi cannot be routed to a redistributor that is on a foreign node */
> -	if (its_dev->its->flags & ITS_FLAGS_WORKAROUND_CAVIUM_23144) {
> +	if (its_dev->its->flags & ITS_FLAGS_WORKAROUND_CAVIUM_23144 ||
> +			its_dev->its->flags & ITS_FLAGS_WORKAROUND_CAVIUM_174) {
>  		if (its_dev->its->numa_node >= 0) {
>  			cpu_mask = cpumask_of_node(its_dev->its->numa_node);
>  			if (!cpumask_intersects(mask_val, cpu_mask))
> @@ -2904,6 +2906,15 @@ static int its_force_quiescent(void __iomem *base)
>  	}
>  }
>  
> +static bool __maybe_unused its_enable_quirk_cavium_174(void *data)
> +{
> +	struct its_node *its = data;
> +
> +	its->flags |= ITS_FLAGS_WORKAROUND_CAVIUM_174;
> +
> +	return true;
> +}
> +
>  static bool __maybe_unused its_enable_quirk_cavium_22375(void *data)
>  {
>  	struct its_node *its = data;
> @@ -3031,6 +3042,14 @@ static const struct gic_quirk its_quirks[] = {
>  		.init	= its_enable_quirk_hip07_161600802,
>  	},
>  #endif
> +#ifdef CONFIG_CAVIUM_ERRATUM_174
> +	{
> +		.desc	= "ITS: Cavium ThunderX2 erratum 174",
> +		.iidr	= 0x13f,	/* ThunderX2 pass A1/A2/B0 */
> +		.mask	= 0xffffffff,
> +		.init	= its_enable_quirk_cavium_174,
> +	},
> +#endif
>  	{
>  	}
>  };
> 

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] irqchip/gic-v3-its: Add workaround for ThunderX2 erratum #174
  2018-01-19 12:23 ` Marc Zyngier
@ 2018-01-19 14:22   ` Ganapatrao Kulkarni
  0 siblings, 0 replies; 7+ messages in thread
From: Ganapatrao Kulkarni @ 2018-01-19 14:22 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Ganapatrao Kulkarni, linux-doc, linux-kernel, linux-arm-kernel,
	Thomas Gleixner, jason, catalin.marinas, Will Deacon,
	Jonathan Corbet, jnair, Robert Richter, Jan.Glauber,
	Vadim.Lomovtsev

On Fri, Jan 19, 2018 at 5:53 PM, Marc Zyngier <marc.zyngier@arm.com> wrote:
> On 18/01/18 05:28, Ganapatrao Kulkarni wrote:
>> This erratum is observed on the ThunderX2 GICv3 ITS. When a
>> MOVI command is used to change affinity of a LPI to a collection/cpu
>> on another node, the LPI is not delivered to the cpu.
>> An additional INV command is required after the MOVI to deliver
>> the LPI to the new destination.
>>
>> If we add INV after MOVI, there is a chance that we lose LPIs which
>> are raised when the affinity is changed. So for now, adding workaround fix
>> to disable inter node affinity change.
>>
>> Signed-off-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com>
>> ---
>>
>> v2: Added workaround to avoid inter node affinity change.
>>
>> v1: Initial patch
>>
>>  Documentation/arm64/silicon-errata.txt |  1 +
>>  arch/arm64/Kconfig                     | 10 ++++++++++
>>  drivers/irqchip/irq-gic-v3-its.c       | 21 ++++++++++++++++++++-
>>  3 files changed, 31 insertions(+), 1 deletion(-)
>>
>> diff --git a/Documentation/arm64/silicon-errata.txt b/Documentation/arm64/silicon-errata.txt
>> index fc1c884..fb27cb5 100644
>> --- a/Documentation/arm64/silicon-errata.txt
>> +++ b/Documentation/arm64/silicon-errata.txt
>> @@ -63,6 +63,7 @@ stable kernels.
>>  | Cavium         | ThunderX Core   | #27456          | CAVIUM_ERRATUM_27456        |
>>  | Cavium         | ThunderX Core   | #30115          | CAVIUM_ERRATUM_30115        |
>>  | Cavium         | ThunderX SMMUv2 | #27704          | N/A                         |
>> +| Cavium         | ThunderX2 ITS   | #174            | CAVIUM_ERRATUM_174          |
>>  | Cavium         | ThunderX2 SMMUv3| #74             | N/A                         |
>>  | Cavium         | ThunderX2 SMMUv3| #126            | N/A                         |
>>  |                |                 |                 |                             |
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index c9a7e9e..0dbf3bd 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -461,6 +461,16 @@ config ARM64_ERRATUM_843419
>>
>>         If unsure, say Y.
>>
>> +config CAVIUM_ERRATUM_174
>> +     bool "Cavium ThunderX2 erratum 174"
>> +     default y
>> +     help
>> +       Cavium ThunderX2 dual socket systems may loose interrupts
>> +       on affinity change to a cpu on other node.
>> +       This workaround fix avoids inter node affinity change.
>> +
>> +       If unsure, say Y.
>> +
>>  config CAVIUM_ERRATUM_22375
>>       bool "Cavium erratum 22375, 24313"
>>       default y
>> diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
>> index 06f025f..b0cb528 100644
>> --- a/drivers/irqchip/irq-gic-v3-its.c
>> +++ b/drivers/irqchip/irq-gic-v3-its.c
>> @@ -46,6 +46,7 @@
>>  #define ITS_FLAGS_CMDQ_NEEDS_FLUSHING                (1ULL << 0)
>>  #define ITS_FLAGS_WORKAROUND_CAVIUM_22375    (1ULL << 1)
>>  #define ITS_FLAGS_WORKAROUND_CAVIUM_23144    (1ULL << 2)
>> +#define ITS_FLAGS_WORKAROUND_CAVIUM_174              (1ULL << 3)
>
> Instead of inventing a new flag, please rename the existing one to
> ITS_FLAG_WORKAROUND_RESTRICT_NODE_AFFINITY (or something similar). There
> is really no need to have two flags that do the exact same thing,

#23144 is used to restrict ITS to collection mapping too,
where as 174 is only restricts cross node affinity.
Having said that, Since we are restricting affinity in #174,
i see there is no use of having ITS to other node collection mapping.
There should not be any issue if we club flag. I will post this change
in next version.

>
>>
>>  #define RDIST_FLAGS_PROPBASE_NEEDS_FLUSHING  (1 << 0)
>>
>> @@ -1102,7 +1103,8 @@ static int its_set_affinity(struct irq_data *d, const struct cpumask *mask_val,
>>               return -EINVAL;
>>
>>         /* lpi cannot be routed to a redistributor that is on a foreign node */
>> -     if (its_dev->its->flags & ITS_FLAGS_WORKAROUND_CAVIUM_23144) {
>> +     if (its_dev->its->flags & ITS_FLAGS_WORKAROUND_CAVIUM_23144 ||
>> +                     its_dev->its->flags & ITS_FLAGS_WORKAROUND_CAVIUM_174) {
>>               if (its_dev->its->numa_node >= 0) {
>>                       cpu_mask = cpumask_of_node(its_dev->its->numa_node);
>>                       if (!cpumask_intersects(mask_val, cpu_mask))
>> @@ -2904,6 +2906,15 @@ static int its_force_quiescent(void __iomem *base)
>>       }
>>  }
>>
>> +static bool __maybe_unused its_enable_quirk_cavium_174(void *data)
>> +{
>> +     struct its_node *its = data;
>> +
>> +     its->flags |= ITS_FLAGS_WORKAROUND_CAVIUM_174;
>> +
>> +     return true;
>> +}
>> +
>>  static bool __maybe_unused its_enable_quirk_cavium_22375(void *data)
>>  {
>>       struct its_node *its = data;
>> @@ -3031,6 +3042,14 @@ static const struct gic_quirk its_quirks[] = {
>>               .init   = its_enable_quirk_hip07_161600802,
>>       },
>>  #endif
>> +#ifdef CONFIG_CAVIUM_ERRATUM_174
>> +     {
>> +             .desc   = "ITS: Cavium ThunderX2 erratum 174",
>> +             .iidr   = 0x13f,        /* ThunderX2 pass A1/A2/B0 */
>> +             .mask   = 0xffffffff,
>> +             .init   = its_enable_quirk_cavium_174,
>> +     },
>> +#endif
>>       {
>>       }
>>  };
>>
>
> Thanks,
>
>         M.
> --
> Jazz is not dead. It just smells funny...

thanks
Ganapat

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] irqchip/gic-v3-its: Add workaround for ThunderX2 erratum #174
  2018-01-18  5:28 [PATCH v2] irqchip/gic-v3-its: Add workaround for ThunderX2 erratum #174 Ganapatrao Kulkarni
  2018-01-19 12:23 ` Marc Zyngier
@ 2018-01-21  7:00 ` Jayachandran C
  2018-01-21 11:35   ` Marc Zyngier
  1 sibling, 1 reply; 7+ messages in thread
From: Jayachandran C @ 2018-01-21  7:00 UTC (permalink / raw)
  To: Ganapatrao Kulkarni
  Cc: linux-doc, linux-kernel, linux-arm-kernel, marc.zyngier, tglx,
	jason, catalin.marinas, will.deacon, corbet, Robert.Richter,
	Jan.Glauber, Vadim.Lomovtsev, gklkml16

On Thu, Jan 18, 2018 at 10:58:20AM +0530, Ganapatrao Kulkarni wrote:
> This erratum is observed on the ThunderX2 GICv3 ITS. When a
> MOVI command is used to change affinity of a LPI to a collection/cpu
> on another node, the LPI is not delivered to the cpu.
> An additional INV command is required after the MOVI to deliver
> the LPI to the new destination.
> 
> If we add INV after MOVI, there is a chance that we lose LPIs which
> are raised when the affinity is changed. So for now, adding workaround fix
> to disable inter node affinity change.
> 
> Signed-off-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com>
> ---
> 
> v2: Added workaround to avoid inter node affinity change.
> 
> v1: Initial patch
> 
>  Documentation/arm64/silicon-errata.txt |  1 +
>  arch/arm64/Kconfig                     | 10 ++++++++++
>  drivers/irqchip/irq-gic-v3-its.c       | 21 ++++++++++++++++++++-
>  3 files changed, 31 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/arm64/silicon-errata.txt b/Documentation/arm64/silicon-errata.txt
> index fc1c884..fb27cb5 100644
> --- a/Documentation/arm64/silicon-errata.txt
> +++ b/Documentation/arm64/silicon-errata.txt
> @@ -63,6 +63,7 @@ stable kernels.
>  | Cavium         | ThunderX Core   | #27456          | CAVIUM_ERRATUM_27456        |
>  | Cavium         | ThunderX Core   | #30115          | CAVIUM_ERRATUM_30115        |
>  | Cavium         | ThunderX SMMUv2 | #27704          | N/A                         |
> +| Cavium         | ThunderX2 ITS   | #174            | CAVIUM_ERRATUM_174          |
>  | Cavium         | ThunderX2 SMMUv3| #74             | N/A                         |
>  | Cavium         | ThunderX2 SMMUv3| #126            | N/A                         |
>  |                |                 |                 |                             |
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index c9a7e9e..0dbf3bd 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -461,6 +461,16 @@ config ARM64_ERRATUM_843419
>  
>  	  If unsure, say Y.
>  
> +config CAVIUM_ERRATUM_174
> +	bool "Cavium ThunderX2 erratum 174"
> +	default y
> +	help
> +	  Cavium ThunderX2 dual socket systems may loose interrupts
> +	  on affinity change to a cpu on other node.
> +	  This workaround fix avoids inter node affinity change.

This has to be fixed up to match the commit message (and for spelling).
I have seen some questions offlist about how important this fix is,
and how it can affect users - so that would be useful to have in the
description as well.

To clarify, this errata comes into play only when the irq affinity is
forced from the node given by the device (and ITS) affinity to another
node.  This should not happen in normal, useful configurations.

Also, we will hold further posting of this errata until we do another
round of investigation with the hardware team for a better solution.
If we can handle the pending interrupts for the small window of MOVI/INV
in first workaround, we will not need this restriction at all.

JC.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] irqchip/gic-v3-its: Add workaround for ThunderX2 erratum #174
  2018-01-21  7:00 ` Jayachandran C
@ 2018-01-21 11:35   ` Marc Zyngier
  2018-02-19 21:12     ` Jayachandran C
  0 siblings, 1 reply; 7+ messages in thread
From: Marc Zyngier @ 2018-01-21 11:35 UTC (permalink / raw)
  To: Jayachandran C
  Cc: Ganapatrao Kulkarni, linux-doc, linux-kernel, linux-arm-kernel,
	tglx, jason, catalin.marinas, will.deacon, corbet,
	Robert.Richter, Jan.Glauber, Vadim.Lomovtsev, gklkml16

On Sun, 21 Jan 2018 07:00:48 +0000,
Jayachandran C wrote:
> 
> On Thu, Jan 18, 2018 at 10:58:20AM +0530, Ganapatrao Kulkarni wrote:
> > This erratum is observed on the ThunderX2 GICv3 ITS. When a
> > MOVI command is used to change affinity of a LPI to a collection/cpu
> > on another node, the LPI is not delivered to the cpu.
> > An additional INV command is required after the MOVI to deliver
> > the LPI to the new destination.
> > 
> > If we add INV after MOVI, there is a chance that we lose LPIs which
> > are raised when the affinity is changed. So for now, adding workaround fix
> > to disable inter node affinity change.
> > 
> > Signed-off-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com>
> > ---
> > 
> > v2: Added workaround to avoid inter node affinity change.
> > 
> > v1: Initial patch
> > 
> >  Documentation/arm64/silicon-errata.txt |  1 +
> >  arch/arm64/Kconfig                     | 10 ++++++++++
> >  drivers/irqchip/irq-gic-v3-its.c       | 21 ++++++++++++++++++++-
> >  3 files changed, 31 insertions(+), 1 deletion(-)
> > 
> > diff --git a/Documentation/arm64/silicon-errata.txt b/Documentation/arm64/silicon-errata.txt
> > index fc1c884..fb27cb5 100644
> > --- a/Documentation/arm64/silicon-errata.txt
> > +++ b/Documentation/arm64/silicon-errata.txt
> > @@ -63,6 +63,7 @@ stable kernels.
> >  | Cavium         | ThunderX Core   | #27456          | CAVIUM_ERRATUM_27456        |
> >  | Cavium         | ThunderX Core   | #30115          | CAVIUM_ERRATUM_30115        |
> >  | Cavium         | ThunderX SMMUv2 | #27704          | N/A                         |
> > +| Cavium         | ThunderX2 ITS   | #174            | CAVIUM_ERRATUM_174          |
> >  | Cavium         | ThunderX2 SMMUv3| #74             | N/A                         |
> >  | Cavium         | ThunderX2 SMMUv3| #126            | N/A                         |
> >  |                |                 |                 |                             |
> > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> > index c9a7e9e..0dbf3bd 100644
> > --- a/arch/arm64/Kconfig
> > +++ b/arch/arm64/Kconfig
> > @@ -461,6 +461,16 @@ config ARM64_ERRATUM_843419
> >  
> >  	  If unsure, say Y.
> >  
> > +config CAVIUM_ERRATUM_174
> > +	bool "Cavium ThunderX2 erratum 174"
> > +	default y
> > +	help
> > +	  Cavium ThunderX2 dual socket systems may loose interrupts
> > +	  on affinity change to a cpu on other node.
> > +	  This workaround fix avoids inter node affinity change.
> 
> This has to be fixed up to match the commit message (and for spelling).
> I have seen some questions offlist about how important this fix is,
> and how it can affect users - so that would be useful to have in the
> description as well.
> 
> To clarify, this errata comes into play only when the irq affinity is
> forced from the node given by the device (and ITS) affinity to another
> node.  This should not happen in normal, useful configurations.

Define normal. That's all under control of userspace, and the kernel
doesn't really have a say. irqbalance will happily move interrupts
around. Disable all CPUs from node at runtime (again, from userspace),
and you'll get the exact same thing. I can't see what's so "abnormal"
about any of that.

> Also, we will hold further posting of this errata until we do another
> round of investigation with the hardware team for a better solution.
> If we can handle the pending interrupts for the small window of MOVI/INV
> in first workaround, we will not need this restriction at all.

What do you mean by "If we can handle the pending interrupts for the
small window of MOVI/INV"? Taking the interrupt on the source CPU?
Sure, that would be fine. But that's assuming that the souce CPU is in
a position to actually handle this, and is not simply going down.

If there is only a slight possibility that you may loose an interrupt
in the MOVI/INV window (which is not that small, since that's a 4
command sequence), your only other solution is to inject a spurious
interrupt to replace the one you may have lost in that window.

In the meantime, and until I see a patch fixing this (or a decent
explanation of why this isn't a problem), I'll consider it broken.

Thanks,

	M.

-- 
Jazz is not dead, it just smell funny.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] irqchip/gic-v3-its: Add workaround for ThunderX2 erratum #174
  2018-01-21 11:35   ` Marc Zyngier
@ 2018-02-19 21:12     ` Jayachandran C
  2018-02-20  9:07       ` Marc Zyngier
  0 siblings, 1 reply; 7+ messages in thread
From: Jayachandran C @ 2018-02-19 21:12 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Ganapatrao Kulkarni, linux-doc, linux-kernel, linux-arm-kernel,
	tglx, jason, catalin.marinas, will.deacon, corbet,
	Robert.Richter, Jan.Glauber, Vadim.Lomovtsev, gklkml16

On Sun, Jan 21, 2018 at 11:35:34AM +0000, Marc Zyngier wrote:
> On Sun, 21 Jan 2018 07:00:48 +0000,
> Jayachandran C wrote:
> > 
> > On Thu, Jan 18, 2018 at 10:58:20AM +0530, Ganapatrao Kulkarni wrote:
> > > This erratum is observed on the ThunderX2 GICv3 ITS. When a
> > > MOVI command is used to change affinity of a LPI to a collection/cpu
> > > on another node, the LPI is not delivered to the cpu.
> > > An additional INV command is required after the MOVI to deliver
> > > the LPI to the new destination.
> > > 
> > > If we add INV after MOVI, there is a chance that we lose LPIs which
> > > are raised when the affinity is changed. So for now, adding workaround fix
> > > to disable inter node affinity change.
> > > 
> > > Signed-off-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com>
> > > ---
> > > 
> > > v2: Added workaround to avoid inter node affinity change.
> > > 
> > > v1: Initial patch
> > > 
> > >  Documentation/arm64/silicon-errata.txt |  1 +
> > >  arch/arm64/Kconfig                     | 10 ++++++++++
> > >  drivers/irqchip/irq-gic-v3-its.c       | 21 ++++++++++++++++++++-
> > >  3 files changed, 31 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/Documentation/arm64/silicon-errata.txt b/Documentation/arm64/silicon-errata.txt
> > > index fc1c884..fb27cb5 100644
> > > --- a/Documentation/arm64/silicon-errata.txt
> > > +++ b/Documentation/arm64/silicon-errata.txt
> > > @@ -63,6 +63,7 @@ stable kernels.
> > >  | Cavium         | ThunderX Core   | #27456          | CAVIUM_ERRATUM_27456        |
> > >  | Cavium         | ThunderX Core   | #30115          | CAVIUM_ERRATUM_30115        |
> > >  | Cavium         | ThunderX SMMUv2 | #27704          | N/A                         |
> > > +| Cavium         | ThunderX2 ITS   | #174            | CAVIUM_ERRATUM_174          |
> > >  | Cavium         | ThunderX2 SMMUv3| #74             | N/A                         |
> > >  | Cavium         | ThunderX2 SMMUv3| #126            | N/A                         |
> > >  |                |                 |                 |                             |
> > > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> > > index c9a7e9e..0dbf3bd 100644
> > > --- a/arch/arm64/Kconfig
> > > +++ b/arch/arm64/Kconfig
> > > @@ -461,6 +461,16 @@ config ARM64_ERRATUM_843419
> > >  
> > >  	  If unsure, say Y.
> > >  
> > > +config CAVIUM_ERRATUM_174
> > > +	bool "Cavium ThunderX2 erratum 174"
> > > +	default y
> > > +	help
> > > +	  Cavium ThunderX2 dual socket systems may loose interrupts
> > > +	  on affinity change to a cpu on other node.
> > > +	  This workaround fix avoids inter node affinity change.
> > 
> > This has to be fixed up to match the commit message (and for spelling).
> > I have seen some questions offlist about how important this fix is,
> > and how it can affect users - so that would be useful to have in the
> > description as well.
> > 
> > To clarify, this errata comes into play only when the irq affinity is
> > forced from the node given by the device (and ITS) affinity to another
> > node.  This should not happen in normal, useful configurations.
> 
> Define normal. That's all under control of userspace, and the kernel
> doesn't really have a say. irqbalance will happily move interrupts
> around. Disable all CPUs from node at runtime (again, from userspace),
> and you'll get the exact same thing. I can't see what's so "abnormal"
> about any of that.
> 
> > Also, we will hold further posting of this errata until we do another
> > round of investigation with the hardware team for a better solution.
> > If we can handle the pending interrupts for the small window of MOVI/INV
> > in first workaround, we will not need this restriction at all.
> 
> What do you mean by "If we can handle the pending interrupts for the
> small window of MOVI/INV"? Taking the interrupt on the source CPU?
> Sure, that would be fine. But that's assuming that the souce CPU is in
> a position to actually handle this, and is not simply going down.
> 
> If there is only a slight possibility that you may loose an interrupt
> in the MOVI/INV window (which is not that small, since that's a 4
> command sequence), your only other solution is to inject a spurious
> interrupt to replace the one you may have lost in that window.
> 
> In the meantime, and until I see a patch fixing this (or a decent
> explanation of why this isn't a problem), I'll consider it broken.

After reviewing the issue with our hardware team, we decided to
tweak the redistributor cache configuration from firmware rather
than go with this errata workaournd in Linux (and other OSes).

So, with the new firmware MOVI will work across nodes as expected,
and this patch is no longer neeeded.

Thanks,
JC.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] irqchip/gic-v3-its: Add workaround for ThunderX2 erratum #174
  2018-02-19 21:12     ` Jayachandran C
@ 2018-02-20  9:07       ` Marc Zyngier
  0 siblings, 0 replies; 7+ messages in thread
From: Marc Zyngier @ 2018-02-20  9:07 UTC (permalink / raw)
  To: Jayachandran C
  Cc: Ganapatrao Kulkarni, linux-doc, linux-kernel, linux-arm-kernel,
	tglx, jason, catalin.marinas, will.deacon, corbet,
	Robert.Richter, Jan.Glauber, Vadim.Lomovtsev, gklkml16

On Mon, 19 Feb 2018 21:12:10 +0000,
Jayachandran C wrote:
> 
> On Sun, Jan 21, 2018 at 11:35:34AM +0000, Marc Zyngier wrote:
> > On Sun, 21 Jan 2018 07:00:48 +0000,
> > Jayachandran C wrote:
> > > 
> > > On Thu, Jan 18, 2018 at 10:58:20AM +0530, Ganapatrao Kulkarni wrote:
> > > > This erratum is observed on the ThunderX2 GICv3 ITS. When a
> > > > MOVI command is used to change affinity of a LPI to a collection/cpu
> > > > on another node, the LPI is not delivered to the cpu.
> > > > An additional INV command is required after the MOVI to deliver
> > > > the LPI to the new destination.
> > > > 
> > > > If we add INV after MOVI, there is a chance that we lose LPIs which
> > > > are raised when the affinity is changed. So for now, adding workaround fix
> > > > to disable inter node affinity change.
> > > > 
> > > > Signed-off-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com>
> > > > ---
> > > > 
> > > > v2: Added workaround to avoid inter node affinity change.
> > > > 
> > > > v1: Initial patch
> > > > 
> > > >  Documentation/arm64/silicon-errata.txt |  1 +
> > > >  arch/arm64/Kconfig                     | 10 ++++++++++
> > > >  drivers/irqchip/irq-gic-v3-its.c       | 21 ++++++++++++++++++++-
> > > >  3 files changed, 31 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/Documentation/arm64/silicon-errata.txt b/Documentation/arm64/silicon-errata.txt
> > > > index fc1c884..fb27cb5 100644
> > > > --- a/Documentation/arm64/silicon-errata.txt
> > > > +++ b/Documentation/arm64/silicon-errata.txt
> > > > @@ -63,6 +63,7 @@ stable kernels.
> > > >  | Cavium         | ThunderX Core   | #27456          | CAVIUM_ERRATUM_27456        |
> > > >  | Cavium         | ThunderX Core   | #30115          | CAVIUM_ERRATUM_30115        |
> > > >  | Cavium         | ThunderX SMMUv2 | #27704          | N/A                         |
> > > > +| Cavium         | ThunderX2 ITS   | #174            | CAVIUM_ERRATUM_174          |
> > > >  | Cavium         | ThunderX2 SMMUv3| #74             | N/A                         |
> > > >  | Cavium         | ThunderX2 SMMUv3| #126            | N/A                         |
> > > >  |                |                 |                 |                             |
> > > > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> > > > index c9a7e9e..0dbf3bd 100644
> > > > --- a/arch/arm64/Kconfig
> > > > +++ b/arch/arm64/Kconfig
> > > > @@ -461,6 +461,16 @@ config ARM64_ERRATUM_843419
> > > >  
> > > >  	  If unsure, say Y.
> > > >  
> > > > +config CAVIUM_ERRATUM_174
> > > > +	bool "Cavium ThunderX2 erratum 174"
> > > > +	default y
> > > > +	help
> > > > +	  Cavium ThunderX2 dual socket systems may loose interrupts
> > > > +	  on affinity change to a cpu on other node.
> > > > +	  This workaround fix avoids inter node affinity change.
> > > 
> > > This has to be fixed up to match the commit message (and for spelling).
> > > I have seen some questions offlist about how important this fix is,
> > > and how it can affect users - so that would be useful to have in the
> > > description as well.
> > > 
> > > To clarify, this errata comes into play only when the irq affinity is
> > > forced from the node given by the device (and ITS) affinity to another
> > > node.  This should not happen in normal, useful configurations.
> > 
> > Define normal. That's all under control of userspace, and the kernel
> > doesn't really have a say. irqbalance will happily move interrupts
> > around. Disable all CPUs from node at runtime (again, from userspace),
> > and you'll get the exact same thing. I can't see what's so "abnormal"
> > about any of that.
> > 
> > > Also, we will hold further posting of this errata until we do another
> > > round of investigation with the hardware team for a better solution.
> > > If we can handle the pending interrupts for the small window of MOVI/INV
> > > in first workaround, we will not need this restriction at all.
> > 
> > What do you mean by "If we can handle the pending interrupts for the
> > small window of MOVI/INV"? Taking the interrupt on the source CPU?
> > Sure, that would be fine. But that's assuming that the souce CPU is in
> > a position to actually handle this, and is not simply going down.
> > 
> > If there is only a slight possibility that you may loose an interrupt
> > in the MOVI/INV window (which is not that small, since that's a 4
> > command sequence), your only other solution is to inject a spurious
> > interrupt to replace the one you may have lost in that window.
> > 
> > In the meantime, and until I see a patch fixing this (or a decent
> > explanation of why this isn't a problem), I'll consider it broken.
> 
> After reviewing the issue with our hardware team, we decided to
> tweak the redistributor cache configuration from firmware rather
> than go with this errata workaournd in Linux (and other OSes).
> 
> So, with the new firmware MOVI will work across nodes as expected,
> and this patch is no longer neeeded.

That's good news! One question though: how do we detect the old (or
new) firmware? We'd still need mitigation for the "old firmware"
case, unless you have a way to guarantee that all existing boxes are
magically updated to a fixed version?

Thanks,

	M.

-- 
Jazz is not dead, it just smell funny.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2018-02-20  9:07 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-18  5:28 [PATCH v2] irqchip/gic-v3-its: Add workaround for ThunderX2 erratum #174 Ganapatrao Kulkarni
2018-01-19 12:23 ` Marc Zyngier
2018-01-19 14:22   ` Ganapatrao Kulkarni
2018-01-21  7:00 ` Jayachandran C
2018-01-21 11:35   ` Marc Zyngier
2018-02-19 21:12     ` Jayachandran C
2018-02-20  9:07       ` Marc Zyngier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).