All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3] x86/hyper-v: micro-optimize send_ipi_one case
@ 2019-10-27 15:19 Vitaly Kuznetsov
  2019-10-27 16:49 ` Michael Kelley
                   ` (3 more replies)
  0 siblings, 4 replies; 6+ messages in thread
From: Vitaly Kuznetsov @ 2019-10-27 15:19 UTC (permalink / raw)
  To: linux-hyperv
  Cc: linux-kernel, x86, K. Y. Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Sasha Levin, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, Roman Kagan, Michael Kelley,
	Joe Perches

When sending an IPI to a single CPU there is no need to deal with cpumasks.
With 2 CPU guest on WS2019 I'm seeing a minor (like 3%, 8043 -> 7761 CPU
cycles) improvement with smp_call_function_single() loop benchmark. The
optimization, however, is tiny and straitforward. Also, send_ipi_one() is
important for PV spinlock kick.

I was also wondering if it would make sense to switch to using regular
APIC IPI send for CPU > 64 case but no, it is twice as expesive (12650 CPU
cycles for __send_ipi_mask_ex() call, 26000 for orig_apic.send_IPI(cpu,
vector)).

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
Changes since v2:
 - Check VP number instead of CPU number against >= 64 [Michael]
 - Check for VP_INVAL
---
 arch/x86/hyperv/hv_apic.c           | 16 +++++++++++++---
 arch/x86/include/asm/trace/hyperv.h | 15 +++++++++++++++
 2 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/arch/x86/hyperv/hv_apic.c b/arch/x86/hyperv/hv_apic.c
index e01078e93dd3..40e0e322161d 100644
--- a/arch/x86/hyperv/hv_apic.c
+++ b/arch/x86/hyperv/hv_apic.c
@@ -194,10 +194,20 @@ static bool __send_ipi_mask(const struct cpumask *mask, int vector)
 
 static bool __send_ipi_one(int cpu, int vector)
 {
-	struct cpumask mask = CPU_MASK_NONE;
+	int vp = hv_cpu_number_to_vp_number(cpu);
 
-	cpumask_set_cpu(cpu, &mask);
-	return __send_ipi_mask(&mask, vector);
+	trace_hyperv_send_ipi_one(cpu, vector);
+
+	if (!hv_hypercall_pg || (vp == VP_INVAL))
+		return false;
+
+	if ((vector < HV_IPI_LOW_VECTOR) || (vector > HV_IPI_HIGH_VECTOR))
+		return false;
+
+	if (vp >= 64)
+		return __send_ipi_mask_ex(cpumask_of(cpu), vector);
+
+	return !hv_do_fast_hypercall16(HVCALL_SEND_IPI, vector, BIT_ULL(vp));
 }
 
 static void hv_send_ipi(int cpu, int vector)
diff --git a/arch/x86/include/asm/trace/hyperv.h b/arch/x86/include/asm/trace/hyperv.h
index ace464f09681..4d705cb4d63b 100644
--- a/arch/x86/include/asm/trace/hyperv.h
+++ b/arch/x86/include/asm/trace/hyperv.h
@@ -71,6 +71,21 @@ TRACE_EVENT(hyperv_send_ipi_mask,
 		      __entry->ncpus, __entry->vector)
 	);
 
+TRACE_EVENT(hyperv_send_ipi_one,
+	    TP_PROTO(int cpu,
+		     int vector),
+	    TP_ARGS(cpu, vector),
+	    TP_STRUCT__entry(
+		    __field(int, cpu)
+		    __field(int, vector)
+		    ),
+	    TP_fast_assign(__entry->cpu = cpu;
+			   __entry->vector = vector;
+		    ),
+	    TP_printk("cpu %d vector %x",
+		      __entry->cpu, __entry->vector)
+	);
+
 #endif /* CONFIG_HYPERV */
 
 #undef TRACE_INCLUDE_PATH
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* RE: [PATCH v3] x86/hyper-v: micro-optimize send_ipi_one case
  2019-10-27 15:19 [PATCH v3] x86/hyper-v: micro-optimize send_ipi_one case Vitaly Kuznetsov
@ 2019-10-27 16:49 ` Michael Kelley
  2019-10-28  9:35 ` Roman Kagan
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 6+ messages in thread
From: Michael Kelley @ 2019-10-27 16:49 UTC (permalink / raw)
  To: vkuznets, linux-hyperv
  Cc: linux-kernel, x86, KY Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Sasha Levin, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, Roman Kagan, Joe Perches

From: Vitaly Kuznetsov <vkuznets@redhat.com> Sent: Sunday, October 27, 2019 8:20 AM
> 
> When sending an IPI to a single CPU there is no need to deal with cpumasks.
> With 2 CPU guest on WS2019 I'm seeing a minor (like 3%, 8043 -> 7761 CPU
> cycles) improvement with smp_call_function_single() loop benchmark. The
> optimization, however, is tiny and straitforward. Also, send_ipi_one() is
> important for PV spinlock kick.
> 
> I was also wondering if it would make sense to switch to using regular
> APIC IPI send for CPU > 64 case but no, it is twice as expesive (12650 CPU
> cycles for __send_ipi_mask_ex() call, 26000 for orig_apic.send_IPI(cpu,
> vector)).
> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
> Changes since v2:
>  - Check VP number instead of CPU number against >= 64 [Michael]
>  - Check for VP_INVAL
>

Reviewed-by: Michael Kelley <mikelley@microsoft.com>





^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v3] x86/hyper-v: micro-optimize send_ipi_one case
  2019-10-27 15:19 [PATCH v3] x86/hyper-v: micro-optimize send_ipi_one case Vitaly Kuznetsov
  2019-10-27 16:49 ` Michael Kelley
@ 2019-10-28  9:35 ` Roman Kagan
  2019-11-07 13:26 ` Vitaly Kuznetsov
  2019-11-12 10:50 ` [tip: x86/hyperv] x86/hyperv: Micro-optimize send_ipi_one() tip-bot2 for Vitaly Kuznetsov
  3 siblings, 0 replies; 6+ messages in thread
From: Roman Kagan @ 2019-10-28  9:35 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: linux-hyperv, linux-kernel, x86, K. Y. Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Sasha Levin, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, Michael Kelley, Joe Perches

On Sun, Oct 27, 2019 at 04:19:38PM +0100, Vitaly Kuznetsov wrote:
> When sending an IPI to a single CPU there is no need to deal with cpumasks.
> With 2 CPU guest on WS2019 I'm seeing a minor (like 3%, 8043 -> 7761 CPU
> cycles) improvement with smp_call_function_single() loop benchmark. The
> optimization, however, is tiny and straitforward. Also, send_ipi_one() is
> important for PV spinlock kick.
> 
> I was also wondering if it would make sense to switch to using regular
> APIC IPI send for CPU > 64 case but no, it is twice as expesive (12650 CPU
> cycles for __send_ipi_mask_ex() call, 26000 for orig_apic.send_IPI(cpu,
> vector)).
> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
> Changes since v2:
>  - Check VP number instead of CPU number against >= 64 [Michael]
>  - Check for VP_INVAL
> ---
>  arch/x86/hyperv/hv_apic.c           | 16 +++++++++++++---
>  arch/x86/include/asm/trace/hyperv.h | 15 +++++++++++++++
>  2 files changed, 28 insertions(+), 3 deletions(-)

Reviewed-by: Roman Kagan <rkagan@virtuozzo.com>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v3] x86/hyper-v: micro-optimize send_ipi_one case
  2019-10-27 15:19 [PATCH v3] x86/hyper-v: micro-optimize send_ipi_one case Vitaly Kuznetsov
  2019-10-27 16:49 ` Michael Kelley
  2019-10-28  9:35 ` Roman Kagan
@ 2019-11-07 13:26 ` Vitaly Kuznetsov
  2019-11-07 21:21   ` Thomas Gleixner
  2019-11-12 10:50 ` [tip: x86/hyperv] x86/hyperv: Micro-optimize send_ipi_one() tip-bot2 for Vitaly Kuznetsov
  3 siblings, 1 reply; 6+ messages in thread
From: Vitaly Kuznetsov @ 2019-11-07 13:26 UTC (permalink / raw)
  To: Sasha Levin
  Cc: linux-hyperv, linux-kernel, x86, K. Y. Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Roman Kagan, Michael Kelley, Joe Perches

Vitaly Kuznetsov <vkuznets@redhat.com> writes:

> When sending an IPI to a single CPU there is no need to deal with cpumasks.
> With 2 CPU guest on WS2019 I'm seeing a minor (like 3%, 8043 -> 7761 CPU
> cycles) improvement with smp_call_function_single() loop benchmark. The
> optimization, however, is tiny and straitforward. Also, send_ipi_one() is
> important for PV spinlock kick.
>
> I was also wondering if it would make sense to switch to using regular
> APIC IPI send for CPU > 64 case but no, it is twice as expesive (12650 CPU
> cycles for __send_ipi_mask_ex() call, 26000 for orig_apic.send_IPI(cpu,
> vector)).
>
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
> Changes since v2:
>  - Check VP number instead of CPU number against >= 64 [Michael]
>  - Check for VP_INVAL

Hi Sasha,

do you have plans to pick this up for hyperv-next or should we ask x86
folks to?

Thanks!

-- 
Vitaly

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v3] x86/hyper-v: micro-optimize send_ipi_one case
  2019-11-07 13:26 ` Vitaly Kuznetsov
@ 2019-11-07 21:21   ` Thomas Gleixner
  0 siblings, 0 replies; 6+ messages in thread
From: Thomas Gleixner @ 2019-11-07 21:21 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Sasha Levin, linux-hyperv, linux-kernel, x86, K. Y. Srinivasan,
	Haiyang Zhang, Stephen Hemminger, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Roman Kagan, Michael Kelley, Joe Perches

On Thu, 7 Nov 2019, Vitaly Kuznetsov wrote:

> Vitaly Kuznetsov <vkuznets@redhat.com> writes:
> 
> > When sending an IPI to a single CPU there is no need to deal with cpumasks.
> > With 2 CPU guest on WS2019 I'm seeing a minor (like 3%, 8043 -> 7761 CPU
> > cycles) improvement with smp_call_function_single() loop benchmark. The
> > optimization, however, is tiny and straitforward. Also, send_ipi_one() is
> > important for PV spinlock kick.
> >
> > I was also wondering if it would make sense to switch to using regular
> > APIC IPI send for CPU > 64 case but no, it is twice as expesive (12650 CPU
> > cycles for __send_ipi_mask_ex() call, 26000 for orig_apic.send_IPI(cpu,
> > vector)).
> >
> > Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> > ---
> > Changes since v2:
> >  - Check VP number instead of CPU number against >= 64 [Michael]
> >  - Check for VP_INVAL
> 
> Hi Sasha,
> 
> do you have plans to pick this up for hyperv-next or should we ask x86
> folks to?

I'm picking up the constant TSC one anyway, so I can just throw that in as
well.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [tip: x86/hyperv] x86/hyperv: Micro-optimize send_ipi_one()
  2019-10-27 15:19 [PATCH v3] x86/hyper-v: micro-optimize send_ipi_one case Vitaly Kuznetsov
                   ` (2 preceding siblings ...)
  2019-11-07 13:26 ` Vitaly Kuznetsov
@ 2019-11-12 10:50 ` tip-bot2 for Vitaly Kuznetsov
  3 siblings, 0 replies; 6+ messages in thread
From: tip-bot2 for Vitaly Kuznetsov @ 2019-11-12 10:50 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Vitaly Kuznetsov, Thomas Gleixner, Michael Kelley, Roman Kagan,
	Ingo Molnar, Borislav Petkov, linux-kernel

The following commit has been merged into the x86/hyperv branch of tip:

Commit-ID:     b264f57fde0c686c5c1dfdd0c21992c49196bb87
Gitweb:        https://git.kernel.org/tip/b264f57fde0c686c5c1dfdd0c21992c49196bb87
Author:        Vitaly Kuznetsov <vkuznets@redhat.com>
AuthorDate:    Sun, 27 Oct 2019 16:19:38 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Tue, 12 Nov 2019 11:44:20 +01:00

x86/hyperv: Micro-optimize send_ipi_one()

When sending an IPI to a single CPU there is no need to deal with cpumasks.

With 2 CPU guest on WS2019 a minor (like 3%, 8043 -> 7761 CPU cycles)
improvement with smp_call_function_single() loop benchmark can be seeb. The
optimization, however, is tiny and straitforward. Also, send_ipi_one() is
important for PV spinlock kick.

Switching to the regular APIC IPI send for CPU > 64 case does not make
sense as it is twice as expesive (12650 CPU cycles for __send_ipi_mask_ex()
call, 26000 for orig_apic.send_IPI(cpu, vector)).

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Reviewed-by: Roman Kagan <rkagan@virtuozzo.com>
Link: https://lkml.kernel.org/r/20191027151938.7296-1-vkuznets@redhat.com

---
 arch/x86/hyperv/hv_apic.c           | 16 +++++++++++++---
 arch/x86/include/asm/trace/hyperv.h | 15 +++++++++++++++
 2 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/arch/x86/hyperv/hv_apic.c b/arch/x86/hyperv/hv_apic.c
index 5c056b8..86c8674 100644
--- a/arch/x86/hyperv/hv_apic.c
+++ b/arch/x86/hyperv/hv_apic.c
@@ -194,10 +194,20 @@ do_ex_hypercall:
 
 static bool __send_ipi_one(int cpu, int vector)
 {
-	struct cpumask mask = CPU_MASK_NONE;
+	int vp = hv_cpu_number_to_vp_number(cpu);
 
-	cpumask_set_cpu(cpu, &mask);
-	return __send_ipi_mask(&mask, vector);
+	trace_hyperv_send_ipi_one(cpu, vector);
+
+	if (!hv_hypercall_pg || (vp == VP_INVAL))
+		return false;
+
+	if ((vector < HV_IPI_LOW_VECTOR) || (vector > HV_IPI_HIGH_VECTOR))
+		return false;
+
+	if (vp >= 64)
+		return __send_ipi_mask_ex(cpumask_of(cpu), vector);
+
+	return !hv_do_fast_hypercall16(HVCALL_SEND_IPI, vector, BIT_ULL(vp));
 }
 
 static void hv_send_ipi(int cpu, int vector)
diff --git a/arch/x86/include/asm/trace/hyperv.h b/arch/x86/include/asm/trace/hyperv.h
index ace464f..4d705cb 100644
--- a/arch/x86/include/asm/trace/hyperv.h
+++ b/arch/x86/include/asm/trace/hyperv.h
@@ -71,6 +71,21 @@ TRACE_EVENT(hyperv_send_ipi_mask,
 		      __entry->ncpus, __entry->vector)
 	);
 
+TRACE_EVENT(hyperv_send_ipi_one,
+	    TP_PROTO(int cpu,
+		     int vector),
+	    TP_ARGS(cpu, vector),
+	    TP_STRUCT__entry(
+		    __field(int, cpu)
+		    __field(int, vector)
+		    ),
+	    TP_fast_assign(__entry->cpu = cpu;
+			   __entry->vector = vector;
+		    ),
+	    TP_printk("cpu %d vector %x",
+		      __entry->cpu, __entry->vector)
+	);
+
 #endif /* CONFIG_HYPERV */
 
 #undef TRACE_INCLUDE_PATH

^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-11-12 10:50 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-27 15:19 [PATCH v3] x86/hyper-v: micro-optimize send_ipi_one case Vitaly Kuznetsov
2019-10-27 16:49 ` Michael Kelley
2019-10-28  9:35 ` Roman Kagan
2019-11-07 13:26 ` Vitaly Kuznetsov
2019-11-07 21:21   ` Thomas Gleixner
2019-11-12 10:50 ` [tip: x86/hyperv] x86/hyperv: Micro-optimize send_ipi_one() tip-bot2 for Vitaly Kuznetsov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.