linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/3] arm: Some IPI injection optimization
@ 2021-03-29  8:52 Jingyi Wang
  2021-03-29  8:52 ` [RFC PATCH 1/3] irqchip/gic-v3: Make use of ICC_SGI1R IRM bit Jingyi Wang
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Jingyi Wang @ 2021-03-29  8:52 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel
  Cc: maz, tglx, wanghaibin.wang, wangjingyi11, yuzenghui, zhukeqian1

This series optimize arm IPI injection process by making use of
ICC_SGI1R IRM bit and implementing gic_ipi_send_single().

Jingyi Wang (3):
  irqchip/gic-v3: Make use of ICC_SGI1R IRM bit
  irqchip/gic-v3: Implement gic_ipi_send_single()
  arm/arm64: Use gic_ipi_send_single() to inject single IPI

 arch/arm/kernel/smp.c        | 16 +++++++++++++---
 arch/arm64/kernel/smp.c      | 16 +++++++++++++---
 drivers/irqchip/irq-gic-v3.c | 34 ++++++++++++++++++++++++++++++++++
 3 files changed, 60 insertions(+), 6 deletions(-)

-- 
2.19.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [RFC PATCH 1/3] irqchip/gic-v3: Make use of ICC_SGI1R IRM bit
  2021-03-29  8:52 [RFC PATCH 0/3] arm: Some IPI injection optimization Jingyi Wang
@ 2021-03-29  8:52 ` Jingyi Wang
  2021-03-29  9:55   ` Marc Zyngier
  2021-03-29  8:52 ` [RFC PATCH 2/3] irqchip/gic-v3: Implement gic_ipi_send_single() Jingyi Wang
  2021-03-29  8:52 ` [RFC PATCH 3/3] arm/arm64: Use gic_ipi_send_single() to inject single IPI Jingyi Wang
  2 siblings, 1 reply; 9+ messages in thread
From: Jingyi Wang @ 2021-03-29  8:52 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel
  Cc: maz, tglx, wanghaibin.wang, wangjingyi11, yuzenghui, zhukeqian1

IRM, bit[40] in ICC_SGI1R, determines how the generated SGIs
are distributed to PEs. If the bit is set, interrupts are routed
to all PEs in the system excluding "self". We use cpumask to
determine if this bit should be set and make use of that.

This will reduce vm trap when broadcast IPIs are sent.

Signed-off-by: Jingyi Wang <wangjingyi11@huawei.com>
---
 drivers/irqchip/irq-gic-v3.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
index eb0ee356a629..8ecc1b274ea8 100644
--- a/drivers/irqchip/irq-gic-v3.c
+++ b/drivers/irqchip/irq-gic-v3.c
@@ -1127,6 +1127,7 @@ static void gic_send_sgi(u64 cluster_id, u16 tlist, unsigned int irq)
 static void gic_ipi_send_mask(struct irq_data *d, const struct cpumask *mask)
 {
 	int cpu;
+	cpumask_t tmp;
 
 	if (WARN_ON(d->hwirq >= 16))
 		return;
@@ -1137,6 +1138,17 @@ static void gic_ipi_send_mask(struct irq_data *d, const struct cpumask *mask)
 	 */
 	wmb();
 
+	if (!cpumask_and(&tmp, mask, cpumask_of(smp_processor_id()))) {
+		/* Set Interrupt Routing Mode bit */
+		u64 val;
+		val = (d->hwirq) << ICC_SGI1R_SGI_ID_SHIFT;
+		val |= BIT_ULL(ICC_SGI1R_IRQ_ROUTING_MODE_BIT);
+		gic_write_sgi1r(val);
+
+		isb();
+		return;
+	}
+
 	for_each_cpu(cpu, mask) {
 		u64 cluster_id = MPIDR_TO_SGI_CLUSTER_ID(cpu_logical_map(cpu));
 		u16 tlist;
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC PATCH 2/3] irqchip/gic-v3: Implement gic_ipi_send_single()
  2021-03-29  8:52 [RFC PATCH 0/3] arm: Some IPI injection optimization Jingyi Wang
  2021-03-29  8:52 ` [RFC PATCH 1/3] irqchip/gic-v3: Make use of ICC_SGI1R IRM bit Jingyi Wang
@ 2021-03-29  8:52 ` Jingyi Wang
  2021-03-29  8:52 ` [RFC PATCH 3/3] arm/arm64: Use gic_ipi_send_single() to inject single IPI Jingyi Wang
  2 siblings, 0 replies; 9+ messages in thread
From: Jingyi Wang @ 2021-03-29  8:52 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel
  Cc: maz, tglx, wanghaibin.wang, wangjingyi11, yuzenghui, zhukeqian1

We implement gic_ipi_send_single() to make single ipi injection
easier.

Signed-off-by: Jingyi Wang <wangjingyi11@huawei.com>
---
 drivers/irqchip/irq-gic-v3.c | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
index 8ecc1b274ea8..5c44e1e719d6 100644
--- a/drivers/irqchip/irq-gic-v3.c
+++ b/drivers/irqchip/irq-gic-v3.c
@@ -1124,6 +1124,26 @@ static void gic_send_sgi(u64 cluster_id, u16 tlist, unsigned int irq)
 	gic_write_sgi1r(val);
 }
 
+static void gic_ipi_send_single(struct irq_data *d, unsigned int cpu)
+{
+	unsigned long mpidr;
+	u64 cluster_id;
+	u16 tlist;
+
+	if (WARN_ON(d->hwirq >= 16))
+		return;
+
+	wmb();
+
+	mpidr = cpu_logical_map(cpu);
+	cluster_id = MPIDR_TO_SGI_CLUSTER_ID(mpidr);
+	tlist = 1 << (mpidr & 0xf);
+
+	gic_send_sgi(cluster_id, tlist, d->hwirq);
+
+	isb();
+}
+
 static void gic_ipi_send_mask(struct irq_data *d, const struct cpumask *mask)
 {
 	int cpu;
@@ -1279,6 +1299,7 @@ static struct irq_chip gic_chip = {
 	.irq_nmi_setup		= gic_irq_nmi_setup,
 	.irq_nmi_teardown	= gic_irq_nmi_teardown,
 	.ipi_send_mask		= gic_ipi_send_mask,
+	.ipi_send_single	= gic_ipi_send_single,
 	.flags			= IRQCHIP_SET_TYPE_MASKED |
 				  IRQCHIP_SKIP_SET_WAKE |
 				  IRQCHIP_MASK_ON_SUSPEND,
@@ -1298,6 +1319,7 @@ static struct irq_chip gic_eoimode1_chip = {
 	.irq_nmi_setup		= gic_irq_nmi_setup,
 	.irq_nmi_teardown	= gic_irq_nmi_teardown,
 	.ipi_send_mask		= gic_ipi_send_mask,
+	.ipi_send_single	= gic_ipi_send_single,
 	.flags			= IRQCHIP_SET_TYPE_MASKED |
 				  IRQCHIP_SKIP_SET_WAKE |
 				  IRQCHIP_MASK_ON_SUSPEND,
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC PATCH 3/3] arm/arm64: Use gic_ipi_send_single() to inject single IPI
  2021-03-29  8:52 [RFC PATCH 0/3] arm: Some IPI injection optimization Jingyi Wang
  2021-03-29  8:52 ` [RFC PATCH 1/3] irqchip/gic-v3: Make use of ICC_SGI1R IRM bit Jingyi Wang
  2021-03-29  8:52 ` [RFC PATCH 2/3] irqchip/gic-v3: Implement gic_ipi_send_single() Jingyi Wang
@ 2021-03-29  8:52 ` Jingyi Wang
  2021-03-29 10:07   ` Marc Zyngier
  2 siblings, 1 reply; 9+ messages in thread
From: Jingyi Wang @ 2021-03-29  8:52 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel
  Cc: maz, tglx, wanghaibin.wang, wangjingyi11, yuzenghui, zhukeqian1

Currently, arm use gic_ipi_send_mask() to inject single IPI, which
make the procedure a little complex. We use gic_ipi_send_single()
instead as some other archs.

Signed-off-by: Jingyi Wang <wangjingyi11@huawei.com>
---
 arch/arm/kernel/smp.c   | 16 +++++++++++++---
 arch/arm64/kernel/smp.c | 16 +++++++++++++---
 2 files changed, 26 insertions(+), 6 deletions(-)

diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index 74679240a9d8..369ce529cdd8 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -534,6 +534,8 @@ static const char *ipi_types[NR_IPI] __tracepoint_string = {
 };
 
 static void smp_cross_call(const struct cpumask *target, unsigned int ipinr);
+static void smp_cross_call_single(const struct cpumask *target, int cpu,
+				  unsigned int ipinr);
 
 void show_ipi_list(struct seq_file *p, int prec)
 {
@@ -564,14 +566,15 @@ void arch_send_wakeup_ipi_mask(const struct cpumask *mask)
 
 void arch_send_call_function_single_ipi(int cpu)
 {
-	smp_cross_call(cpumask_of(cpu), IPI_CALL_FUNC);
+	smp_cross_call_single(cpumask_of(cpu), cpu, IPI_CALL_FUNC);
 }
 
 #ifdef CONFIG_IRQ_WORK
 void arch_irq_work_raise(void)
 {
+	int cpu = smp_processor_id();
 	if (arch_irq_work_has_interrupt())
-		smp_cross_call(cpumask_of(smp_processor_id()), IPI_IRQ_WORK);
+		smp_cross_call(cpumask_of(cpu), cpu, IPI_IRQ_WORK);
 }
 #endif
 
@@ -707,6 +710,13 @@ static void smp_cross_call(const struct cpumask *target, unsigned int ipinr)
 	__ipi_send_mask(ipi_desc[ipinr], target);
 }
 
+static void smp_cross_call_single(const struct cpumask *target, int cpu,
+				  unsigned int ipinr)
+{
+	trace_ipi_raise_rcuidle(target, ipi_types[ipinr]);
+	__ipi_send_single(ipi_desc[ipinr], cpu);
+}
+
 static void ipi_setup(int cpu)
 {
 	int i;
@@ -744,7 +754,7 @@ void __init set_smp_ipi_range(int ipi_base, int n)
 
 void smp_send_reschedule(int cpu)
 {
-	smp_cross_call(cpumask_of(cpu), IPI_RESCHEDULE);
+	smp_cross_call_single(cpumask_of(cpu), cpu, IPI_RESCHEDULE);
 }
 
 void smp_send_stop(void)
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index 357590beaabb..d290b6dc5a6e 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -801,6 +801,8 @@ static const char *ipi_types[NR_IPI] __tracepoint_string = {
 };
 
 static void smp_cross_call(const struct cpumask *target, unsigned int ipinr);
+static void smp_cross_call_single(const struct cpumask *target, int cpu,
+				  unsigned int ipinr);
 
 unsigned long irq_err_count;
 
@@ -827,7 +829,7 @@ void arch_send_call_function_ipi_mask(const struct cpumask *mask)
 
 void arch_send_call_function_single_ipi(int cpu)
 {
-	smp_cross_call(cpumask_of(cpu), IPI_CALL_FUNC);
+	smp_cross_call_single(cpumask_of(cpu), cpu, IPI_CALL_FUNC);
 }
 
 #ifdef CONFIG_ARM64_ACPI_PARKING_PROTOCOL
@@ -840,7 +842,8 @@ void arch_send_wakeup_ipi_mask(const struct cpumask *mask)
 #ifdef CONFIG_IRQ_WORK
 void arch_irq_work_raise(void)
 {
-	smp_cross_call(cpumask_of(smp_processor_id()), IPI_IRQ_WORK);
+	int cpu = smp_processor_id();
+	smp_cross_call_single(cpumask_of(cpu), cpu, IPI_IRQ_WORK);
 }
 #endif
 
@@ -957,6 +960,13 @@ static void smp_cross_call(const struct cpumask *target, unsigned int ipinr)
 	__ipi_send_mask(ipi_desc[ipinr], target);
 }
 
+static void smp_cross_call_single(const struct cpumask *target, int cpu,
+				  unsigned int ipinr)
+{
+	trace_ipi_raise(target, ipi_types[ipinr]);
+	__ipi_send_single(ipi_desc[ipinr], cpu);
+}
+
 static void ipi_setup(int cpu)
 {
 	int i;
@@ -1007,7 +1017,7 @@ void __init set_smp_ipi_range(int ipi_base, int n)
 
 void smp_send_reschedule(int cpu)
 {
-	smp_cross_call(cpumask_of(cpu), IPI_RESCHEDULE);
+	smp_cross_call_single(cpumask_of(cpu), cpu, IPI_RESCHEDULE);
 }
 
 #ifdef CONFIG_GENERIC_CLOCKEVENTS_BROADCAST
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH 1/3] irqchip/gic-v3: Make use of ICC_SGI1R IRM bit
  2021-03-29  8:52 ` [RFC PATCH 1/3] irqchip/gic-v3: Make use of ICC_SGI1R IRM bit Jingyi Wang
@ 2021-03-29  9:55   ` Marc Zyngier
  2021-03-29 10:38     ` Jingyi Wang
  0 siblings, 1 reply; 9+ messages in thread
From: Marc Zyngier @ 2021-03-29  9:55 UTC (permalink / raw)
  To: Jingyi Wang
  Cc: linux-kernel, linux-arm-kernel, tglx, wanghaibin.wang, yuzenghui,
	zhukeqian1

On Mon, 29 Mar 2021 09:52:08 +0100,
Jingyi Wang <wangjingyi11@huawei.com> wrote:
> 
> IRM, bit[40] in ICC_SGI1R, determines how the generated SGIs
> are distributed to PEs. If the bit is set, interrupts are routed
> to all PEs in the system excluding "self". We use cpumask to
> determine if this bit should be set and make use of that.
> 
> This will reduce vm trap when broadcast IPIs are sent.

I remember writing similar code about 4 years ago, only to realise
what:

- the cost of computing the resulting mask is pretty high for large
machines
- Linux almost never sends broadcast IPIs, so the complexity was all
in vain

What changed? Please provide supporting data showing how many IPIs we
actually save, and for which workload.

> 
> Signed-off-by: Jingyi Wang <wangjingyi11@huawei.com>
> ---
>  drivers/irqchip/irq-gic-v3.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
> index eb0ee356a629..8ecc1b274ea8 100644
> --- a/drivers/irqchip/irq-gic-v3.c
> +++ b/drivers/irqchip/irq-gic-v3.c
> @@ -1127,6 +1127,7 @@ static void gic_send_sgi(u64 cluster_id, u16 tlist, unsigned int irq)
>  static void gic_ipi_send_mask(struct irq_data *d, const struct cpumask *mask)
>  {
>  	int cpu;
> +	cpumask_t tmp;
>  
>  	if (WARN_ON(d->hwirq >= 16))
>  		return;
> @@ -1137,6 +1138,17 @@ static void gic_ipi_send_mask(struct irq_data *d, const struct cpumask *mask)
>  	 */
>  	wmb();
>  
> +	if (!cpumask_and(&tmp, mask, cpumask_of(smp_processor_id()))) {

Are you sure this does the right thing? This is checking that the
current CPU is not part of the mask. But it not checking that the mask
is actually "all but self".

This means you are potentially sending IPIs to CPUs that are not part
of the mask, making performance potentially worse.

Thanks,

	M.

> +		/* Set Interrupt Routing Mode bit */
> +		u64 val;
> +		val = (d->hwirq) << ICC_SGI1R_SGI_ID_SHIFT;
> +		val |= BIT_ULL(ICC_SGI1R_IRQ_ROUTING_MODE_BIT);
> +		gic_write_sgi1r(val);
> +
> +		isb();
> +		return;
> +	}
> +
>  	for_each_cpu(cpu, mask) {
>  		u64 cluster_id = MPIDR_TO_SGI_CLUSTER_ID(cpu_logical_map(cpu));
>  		u16 tlist;
> -- 
> 2.19.1
> 
> 

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH 3/3] arm/arm64: Use gic_ipi_send_single() to inject single IPI
  2021-03-29  8:52 ` [RFC PATCH 3/3] arm/arm64: Use gic_ipi_send_single() to inject single IPI Jingyi Wang
@ 2021-03-29 10:07   ` Marc Zyngier
  2021-03-29 11:03     ` Jingyi Wang
  0 siblings, 1 reply; 9+ messages in thread
From: Marc Zyngier @ 2021-03-29 10:07 UTC (permalink / raw)
  To: Jingyi Wang
  Cc: linux-kernel, linux-arm-kernel, tglx, wanghaibin.wang, yuzenghui,
	zhukeqian1

On Mon, 29 Mar 2021 09:52:10 +0100,
Jingyi Wang <wangjingyi11@huawei.com> wrote:
> 
> Currently, arm use gic_ipi_send_mask() to inject single IPI, which
> make the procedure a little complex. We use gic_ipi_send_single()
> instead as some other archs.
> 
> Signed-off-by: Jingyi Wang <wangjingyi11@huawei.com>
> ---
>  arch/arm/kernel/smp.c   | 16 +++++++++++++---
>  arch/arm64/kernel/smp.c | 16 +++++++++++++---
>  2 files changed, 26 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
> index 74679240a9d8..369ce529cdd8 100644
> --- a/arch/arm/kernel/smp.c
> +++ b/arch/arm/kernel/smp.c
> @@ -534,6 +534,8 @@ static const char *ipi_types[NR_IPI] __tracepoint_string = {
>  };
>  
>  static void smp_cross_call(const struct cpumask *target, unsigned int ipinr);
> +static void smp_cross_call_single(const struct cpumask *target, int cpu,
> +				  unsigned int ipinr);

Why does this function need to take both a cpumask *and* a cpu, given
that they represent the same thing?

>
>  void show_ipi_list(struct seq_file *p, int prec)
>  {
> @@ -564,14 +566,15 @@ void arch_send_wakeup_ipi_mask(const struct cpumask *mask)
>  
>  void arch_send_call_function_single_ipi(int cpu)
>  {
> -	smp_cross_call(cpumask_of(cpu), IPI_CALL_FUNC);
> +	smp_cross_call_single(cpumask_of(cpu), cpu, IPI_CALL_FUNC);
>  }
>  
>  #ifdef CONFIG_IRQ_WORK
>  void arch_irq_work_raise(void)
>  {
> +	int cpu = smp_processor_id();
>  	if (arch_irq_work_has_interrupt())
> -		smp_cross_call(cpumask_of(smp_processor_id()), IPI_IRQ_WORK);
> +		smp_cross_call(cpumask_of(cpu), cpu, IPI_IRQ_WORK);

Why isn't that a call to smp_cross_call_single()?

>  }
>  #endif
>  
> @@ -707,6 +710,13 @@ static void smp_cross_call(const struct cpumask *target, unsigned int ipinr)
>  	__ipi_send_mask(ipi_desc[ipinr], target);
>  }
>  
> +static void smp_cross_call_single(const struct cpumask *target, int cpu,
> +				  unsigned int ipinr)
> +{
> +	trace_ipi_raise_rcuidle(target, ipi_types[ipinr]);

Why don't you compute the cpumask here^^?

> +	__ipi_send_single(ipi_desc[ipinr], cpu);
> +}
> +
>  static void ipi_setup(int cpu)
>  {
>  	int i;
> @@ -744,7 +754,7 @@ void __init set_smp_ipi_range(int ipi_base, int n)
>  
>  void smp_send_reschedule(int cpu)
>  {
> -	smp_cross_call(cpumask_of(cpu), IPI_RESCHEDULE);
> +	smp_cross_call_single(cpumask_of(cpu), cpu, IPI_RESCHEDULE);
>  }
>  
>  void smp_send_stop(void)
> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> index 357590beaabb..d290b6dc5a6e 100644
> --- a/arch/arm64/kernel/smp.c
> +++ b/arch/arm64/kernel/smp.c

Similar comments for the arm64 side.

Overall, this needs to be backed by data that indicates that there is
an actual benefit for this extra complexity.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH 1/3] irqchip/gic-v3: Make use of ICC_SGI1R IRM bit
  2021-03-29  9:55   ` Marc Zyngier
@ 2021-03-29 10:38     ` Jingyi Wang
  2021-03-29 11:28       ` Marc Zyngier
  0 siblings, 1 reply; 9+ messages in thread
From: Jingyi Wang @ 2021-03-29 10:38 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-kernel, linux-arm-kernel, tglx, wanghaibin.wang, yuzenghui,
	zhukeqian1



On 3/29/2021 5:55 PM, Marc Zyngier wrote:
> On Mon, 29 Mar 2021 09:52:08 +0100,
> Jingyi Wang <wangjingyi11@huawei.com> wrote:
>>
>> IRM, bit[40] in ICC_SGI1R, determines how the generated SGIs
>> are distributed to PEs. If the bit is set, interrupts are routed
>> to all PEs in the system excluding "self". We use cpumask to
>> determine if this bit should be set and make use of that.
>>
>> This will reduce vm trap when broadcast IPIs are sent.
> 
> I remember writing similar code about 4 years ago, only to realise
> what:
> 
> - the cost of computing the resulting mask is pretty high for large
> machines
> - Linux almost never sends broadcast IPIs, so the complexity was all
> in vain
> 
> What changed? Please provide supporting data showing how many IPIs we
> actually save, and for which workload.
Maybe we can implement send_IPI_allbutself hooks as other some other 
archs instead of computing cpumask here?

>>
>> Signed-off-by: Jingyi Wang <wangjingyi11@huawei.com>
>> ---
>>   drivers/irqchip/irq-gic-v3.c | 12 ++++++++++++
>>   1 file changed, 12 insertions(+)
>>
>> diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
>> index eb0ee356a629..8ecc1b274ea8 100644
>> --- a/drivers/irqchip/irq-gic-v3.c
>> +++ b/drivers/irqchip/irq-gic-v3.c
>> @@ -1127,6 +1127,7 @@ static void gic_send_sgi(u64 cluster_id, u16 tlist, unsigned int irq)
>>   static void gic_ipi_send_mask(struct irq_data *d, const struct cpumask *mask)
>>   {
>>   	int cpu;
>> +	cpumask_t tmp;
>>   
>>   	if (WARN_ON(d->hwirq >= 16))
>>   		return;
>> @@ -1137,6 +1138,17 @@ static void gic_ipi_send_mask(struct irq_data *d, const struct cpumask *mask)
>>   	 */
>>   	wmb();
>>   
>> +	if (!cpumask_and(&tmp, mask, cpumask_of(smp_processor_id()))) {
> 
> Are you sure this does the right thing? This is checking that the
> current CPU is not part of the mask. But it not checking that the mask
> is actually "all but self".
> 
> This means you are potentially sending IPIs to CPUs that are not part
> of the mask, making performance potentially worse.
> 
> Thanks,
> 
> 	M.
> 
I will fix that,thanks.

>> +		/* Set Interrupt Routing Mode bit */
>> +		u64 val;
>> +		val = (d->hwirq) << ICC_SGI1R_SGI_ID_SHIFT;
>> +		val |= BIT_ULL(ICC_SGI1R_IRQ_ROUTING_MODE_BIT);
>> +		gic_write_sgi1r(val);
>> +
>> +		isb();
>> +		return;
>> +	}
>> +
>>   	for_each_cpu(cpu, mask) {
>>   		u64 cluster_id = MPIDR_TO_SGI_CLUSTER_ID(cpu_logical_map(cpu));
>>   		u16 tlist;
>> -- 
>> 2.19.1
>>
>>
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH 3/3] arm/arm64: Use gic_ipi_send_single() to inject single IPI
  2021-03-29 10:07   ` Marc Zyngier
@ 2021-03-29 11:03     ` Jingyi Wang
  0 siblings, 0 replies; 9+ messages in thread
From: Jingyi Wang @ 2021-03-29 11:03 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-kernel, linux-arm-kernel, tglx, wanghaibin.wang, yuzenghui,
	zhukeqian1



On 3/29/2021 6:07 PM, Marc Zyngier wrote:
> On Mon, 29 Mar 2021 09:52:10 +0100,
> Jingyi Wang <wangjingyi11@huawei.com> wrote:
>>
>> Currently, arm use gic_ipi_send_mask() to inject single IPI, which
>> make the procedure a little complex. We use gic_ipi_send_single()
>> instead as some other archs.
>>
>> Signed-off-by: Jingyi Wang <wangjingyi11@huawei.com>
>> ---
>>   arch/arm/kernel/smp.c   | 16 +++++++++++++---
>>   arch/arm64/kernel/smp.c | 16 +++++++++++++---
>>   2 files changed, 26 insertions(+), 6 deletions(-)
>>
>> diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
>> index 74679240a9d8..369ce529cdd8 100644
>> --- a/arch/arm/kernel/smp.c
>> +++ b/arch/arm/kernel/smp.c
>> @@ -534,6 +534,8 @@ static const char *ipi_types[NR_IPI] __tracepoint_string = {
>>   };
>>   
>>   static void smp_cross_call(const struct cpumask *target, unsigned int ipinr);
>> +static void smp_cross_call_single(const struct cpumask *target, int cpu,
>> +				  unsigned int ipinr);
> 
> Why does this function need to take both a cpumask *and* a cpu, given
> that they represent the same thing?
> I was intended to use the extra param to reuse trace_ipi_raise_rcuidle.

>>
>>   void show_ipi_list(struct seq_file *p, int prec)
>>   {
>> @@ -564,14 +566,15 @@ void arch_send_wakeup_ipi_mask(const struct cpumask *mask)
>>   
>>   void arch_send_call_function_single_ipi(int cpu)
>>   {
>> -	smp_cross_call(cpumask_of(cpu), IPI_CALL_FUNC);
>> +	smp_cross_call_single(cpumask_of(cpu), cpu, IPI_CALL_FUNC);
>>   }
>>   
>>   #ifdef CONFIG_IRQ_WORK
>>   void arch_irq_work_raise(void)
>>   {
>> +	int cpu = smp_processor_id();
>>   	if (arch_irq_work_has_interrupt())
>> -		smp_cross_call(cpumask_of(smp_processor_id()), IPI_IRQ_WORK);
>> +		smp_cross_call(cpumask_of(cpu), cpu, IPI_IRQ_WORK);
> 
> Why isn't that a call to smp_cross_call_single()?
> 
I ignored that, thanks.

>>   }
>>   #endif
>>   
>> @@ -707,6 +710,13 @@ static void smp_cross_call(const struct cpumask *target, unsigned int ipinr)
>>   	__ipi_send_mask(ipi_desc[ipinr], target);
>>   }
>>   
>> +static void smp_cross_call_single(const struct cpumask *target, int cpu,
>> +				  unsigned int ipinr)
>> +{
>> +	trace_ipi_raise_rcuidle(target, ipi_types[ipinr]);
> 
> Why don't you compute the cpumask here^^?
> 
>> +	__ipi_send_single(ipi_desc[ipinr], cpu);
>> +}
>> +
>>   static void ipi_setup(int cpu)
>>   {
>>   	int i;
>> @@ -744,7 +754,7 @@ void __init set_smp_ipi_range(int ipi_base, int n)
>>   
>>   void smp_send_reschedule(int cpu)
>>   {
>> -	smp_cross_call(cpumask_of(cpu), IPI_RESCHEDULE);
>> +	smp_cross_call_single(cpumask_of(cpu), cpu, IPI_RESCHEDULE);
>>   }
>>   
>>   void smp_send_stop(void)
>> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
>> index 357590beaabb..d290b6dc5a6e 100644
>> --- a/arch/arm64/kernel/smp.c
>> +++ b/arch/arm64/kernel/smp.c
> 
> Similar comments for the arm64 side.
> 
> Overall, this needs to be backed by data that indicates that there is
> an actual benefit for this extra complexity.
> 
> Thanks,
> 
> 	M.
> 

Firstly, I implemented exitless-IPIs to reduce VM trap caused by sending
IPI as what x86 does here:
https://patchwork.kernel.org/project/kvm/cover/1532327996-17619-1-git-send-email-wanpengli@tencent.com/
Then I realized that sending ipi mask usually cause sending IPIs all
but self as IRM bit defines.
So do you think we can use IRM if we can avoid extra cost like computing
mask, or using broadcast IPIs is just meaningless for now?

Thanks,
Jingyi

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH 1/3] irqchip/gic-v3: Make use of ICC_SGI1R IRM bit
  2021-03-29 10:38     ` Jingyi Wang
@ 2021-03-29 11:28       ` Marc Zyngier
  0 siblings, 0 replies; 9+ messages in thread
From: Marc Zyngier @ 2021-03-29 11:28 UTC (permalink / raw)
  To: Jingyi Wang
  Cc: linux-kernel, linux-arm-kernel, tglx, wanghaibin.wang, yuzenghui,
	zhukeqian1

On Mon, 29 Mar 2021 11:38:04 +0100,
Jingyi Wang <wangjingyi11@huawei.com> wrote:
> 
> 
> 
> On 3/29/2021 5:55 PM, Marc Zyngier wrote:
> > On Mon, 29 Mar 2021 09:52:08 +0100,
> > Jingyi Wang <wangjingyi11@huawei.com> wrote:
> >> 
> >> IRM, bit[40] in ICC_SGI1R, determines how the generated SGIs
> >> are distributed to PEs. If the bit is set, interrupts are routed
> >> to all PEs in the system excluding "self". We use cpumask to
> >> determine if this bit should be set and make use of that.
> >> 
> >> This will reduce vm trap when broadcast IPIs are sent.
> > 
> > I remember writing similar code about 4 years ago, only to realise
> > what:
> > 
> > - the cost of computing the resulting mask is pretty high for large
> > machines
> > - Linux almost never sends broadcast IPIs, so the complexity was all
> > in vain
> > 
> > What changed? Please provide supporting data showing how many IPIs we
> > actually save, and for which workload.
> Maybe we can implement send_IPI_allbutself hooks as other some other
> archs instead of computing cpumask here?

The question remains: how often is that used? x86 uses it only for NMI
(we don't broadcast our pseudo-NMI) and reboot, it seems. Anything I
missed? Do we have a different use case on arm64?

At the moment, this doesn't seem very useful.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-03-29 11:29 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-29  8:52 [RFC PATCH 0/3] arm: Some IPI injection optimization Jingyi Wang
2021-03-29  8:52 ` [RFC PATCH 1/3] irqchip/gic-v3: Make use of ICC_SGI1R IRM bit Jingyi Wang
2021-03-29  9:55   ` Marc Zyngier
2021-03-29 10:38     ` Jingyi Wang
2021-03-29 11:28       ` Marc Zyngier
2021-03-29  8:52 ` [RFC PATCH 2/3] irqchip/gic-v3: Implement gic_ipi_send_single() Jingyi Wang
2021-03-29  8:52 ` [RFC PATCH 3/3] arm/arm64: Use gic_ipi_send_single() to inject single IPI Jingyi Wang
2021-03-29 10:07   ` Marc Zyngier
2021-03-29 11:03     ` Jingyi Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).