Linux-HyperV Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH v2] x86/hyper-v: micro-optimize send_ipi_one case
@ 2019-10-25 13:15 Vitaly Kuznetsov
  2019-10-25 14:05 ` Roman Kagan
  2019-10-25 17:16 ` Michael Kelley
  0 siblings, 2 replies; 4+ messages in thread
From: Vitaly Kuznetsov @ 2019-10-25 13:15 UTC (permalink / raw)
  To: linux-hyperv
  Cc: linux-kernel, x86, K. Y. Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Sasha Levin, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, Roman Kagan, Michael Kelley,
	Joe Perches

When sending an IPI to a single CPU there is no need to deal with cpumasks.
With 2 CPU guest on WS2019 I'm seeing a minor (like 3%, 8043 -> 7761 CPU
cycles) improvement with smp_call_function_single() loop benchmark. The
optimization, however, is tiny and straitforward. Also, send_ipi_one() is
important for PV spinlock kick.

I was also wondering if it would make sense to switch to using regular
APIC IPI send for CPU > 64 case but no, it is twice as expesive (12650 CPU
cycles for __send_ipi_mask_ex() call, 26000 for orig_apic.send_IPI(cpu,
vector)).

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
Changes since v1:
 - Style changes [Roman, Joe]
---
 arch/x86/hyperv/hv_apic.c           | 13 ++++++++++---
 arch/x86/include/asm/trace/hyperv.h | 15 +++++++++++++++
 2 files changed, 25 insertions(+), 3 deletions(-)

diff --git a/arch/x86/hyperv/hv_apic.c b/arch/x86/hyperv/hv_apic.c
index e01078e93dd3..fd17c6341737 100644
--- a/arch/x86/hyperv/hv_apic.c
+++ b/arch/x86/hyperv/hv_apic.c
@@ -194,10 +194,17 @@ static bool __send_ipi_mask(const struct cpumask *mask, int vector)
 
 static bool __send_ipi_one(int cpu, int vector)
 {
-	struct cpumask mask = CPU_MASK_NONE;
+	trace_hyperv_send_ipi_one(cpu, vector);
 
-	cpumask_set_cpu(cpu, &mask);
-	return __send_ipi_mask(&mask, vector);
+	if (!hv_hypercall_pg || (vector < HV_IPI_LOW_VECTOR) ||
+	    (vector > HV_IPI_HIGH_VECTOR))
+		return false;
+
+	if (cpu >= 64)
+		return __send_ipi_mask_ex(cpumask_of(cpu), vector);
+
+	return !hv_do_fast_hypercall16(HVCALL_SEND_IPI, vector,
+			       BIT_ULL(hv_cpu_number_to_vp_number(cpu)));
 }
 
 static void hv_send_ipi(int cpu, int vector)
diff --git a/arch/x86/include/asm/trace/hyperv.h b/arch/x86/include/asm/trace/hyperv.h
index ace464f09681..4d705cb4d63b 100644
--- a/arch/x86/include/asm/trace/hyperv.h
+++ b/arch/x86/include/asm/trace/hyperv.h
@@ -71,6 +71,21 @@ TRACE_EVENT(hyperv_send_ipi_mask,
 		      __entry->ncpus, __entry->vector)
 	);
 
+TRACE_EVENT(hyperv_send_ipi_one,
+	    TP_PROTO(int cpu,
+		     int vector),
+	    TP_ARGS(cpu, vector),
+	    TP_STRUCT__entry(
+		    __field(int, cpu)
+		    __field(int, vector)
+		    ),
+	    TP_fast_assign(__entry->cpu = cpu;
+			   __entry->vector = vector;
+		    ),
+	    TP_printk("cpu %d vector %x",
+		      __entry->cpu, __entry->vector)
+	);
+
 #endif /* CONFIG_HYPERV */
 
 #undef TRACE_INCLUDE_PATH
-- 
2.20.1


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] x86/hyper-v: micro-optimize send_ipi_one case
  2019-10-25 13:15 [PATCH v2] x86/hyper-v: micro-optimize send_ipi_one case Vitaly Kuznetsov
@ 2019-10-25 14:05 ` Roman Kagan
  2019-10-25 17:16 ` Michael Kelley
  1 sibling, 0 replies; 4+ messages in thread
From: Roman Kagan @ 2019-10-25 14:05 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: linux-hyperv, linux-kernel, x86, K. Y. Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Sasha Levin, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, Michael Kelley, Joe Perches

On Fri, Oct 25, 2019 at 03:15:46PM +0200, Vitaly Kuznetsov wrote:
> When sending an IPI to a single CPU there is no need to deal with cpumasks.
> With 2 CPU guest on WS2019 I'm seeing a minor (like 3%, 8043 -> 7761 CPU
> cycles) improvement with smp_call_function_single() loop benchmark. The
> optimization, however, is tiny and straitforward. Also, send_ipi_one() is
> important for PV spinlock kick.
> 
> I was also wondering if it would make sense to switch to using regular
> APIC IPI send for CPU > 64 case but no, it is twice as expesive (12650 CPU
> cycles for __send_ipi_mask_ex() call, 26000 for orig_apic.send_IPI(cpu,
> vector)).
> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
> Changes since v1:
>  - Style changes [Roman, Joe]
> ---
>  arch/x86/hyperv/hv_apic.c           | 13 ++++++++++---
>  arch/x86/include/asm/trace/hyperv.h | 15 +++++++++++++++
>  2 files changed, 25 insertions(+), 3 deletions(-)

Reviewed-by: Roman Kagan <rkagan@virtuozzo.com>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: [PATCH v2] x86/hyper-v: micro-optimize send_ipi_one case
  2019-10-25 13:15 [PATCH v2] x86/hyper-v: micro-optimize send_ipi_one case Vitaly Kuznetsov
  2019-10-25 14:05 ` Roman Kagan
@ 2019-10-25 17:16 ` Michael Kelley
  2019-10-25 17:26   ` Vitaly Kuznetsov
  1 sibling, 1 reply; 4+ messages in thread
From: Michael Kelley @ 2019-10-25 17:16 UTC (permalink / raw)
  To: vkuznets, linux-hyperv
  Cc: linux-kernel, x86, KY Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Sasha Levin, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, Roman Kagan, Joe Perches

From: Vitaly Kuznetsov <vkuznets@redhat.com>
> 
> When sending an IPI to a single CPU there is no need to deal with cpumasks.
> With 2 CPU guest on WS2019 I'm seeing a minor (like 3%, 8043 -> 7761 CPU
> cycles) improvement with smp_call_function_single() loop benchmark. The
> optimization, however, is tiny and straitforward. Also, send_ipi_one() is
> important for PV spinlock kick.
> 
> I was also wondering if it would make sense to switch to using regular
> APIC IPI send for CPU > 64 case but no, it is twice as expesive (12650 CPU
> cycles for __send_ipi_mask_ex() call, 26000 for orig_apic.send_IPI(cpu,
> vector)).
> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
> Changes since v1:
>  - Style changes [Roman, Joe]
> ---
>  arch/x86/hyperv/hv_apic.c           | 13 ++++++++++---
>  arch/x86/include/asm/trace/hyperv.h | 15 +++++++++++++++
>  2 files changed, 25 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/hyperv/hv_apic.c b/arch/x86/hyperv/hv_apic.c
> index e01078e93dd3..fd17c6341737 100644
> --- a/arch/x86/hyperv/hv_apic.c
> +++ b/arch/x86/hyperv/hv_apic.c
> @@ -194,10 +194,17 @@ static bool __send_ipi_mask(const struct cpumask *mask, int
> vector)
> 
>  static bool __send_ipi_one(int cpu, int vector)
>  {
> -	struct cpumask mask = CPU_MASK_NONE;
> +	trace_hyperv_send_ipi_one(cpu, vector);
> 
> -	cpumask_set_cpu(cpu, &mask);
> -	return __send_ipi_mask(&mask, vector);
> +	if (!hv_hypercall_pg || (vector < HV_IPI_LOW_VECTOR) ||
> +	    (vector > HV_IPI_HIGH_VECTOR))
> +		return false;
> +
> +	if (cpu >= 64)
> +		return __send_ipi_mask_ex(cpumask_of(cpu), vector);

The above test should be checking the VP number, not the CPU
number, since the VP number is used to form the bitmap argument
to the hypercall.  In all current implementations of Hyper-V, the CPU number
and VP number are the same as far as I am aware, but that's not guaranteed in 
the future.

Michael

> +
> +	return !hv_do_fast_hypercall16(HVCALL_SEND_IPI, vector,
> +			       BIT_ULL(hv_cpu_number_to_vp_number(cpu)));
>  }
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: [PATCH v2] x86/hyper-v: micro-optimize send_ipi_one case
  2019-10-25 17:16 ` Michael Kelley
@ 2019-10-25 17:26   ` Vitaly Kuznetsov
  0 siblings, 0 replies; 4+ messages in thread
From: Vitaly Kuznetsov @ 2019-10-25 17:26 UTC (permalink / raw)
  To: Michael Kelley, linux-hyperv\
  Cc: linux-kernel\, x86\,
	KY Srinivasan, Haiyang Zhang, Stephen Hemminger, Sasha Levin,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Roman Kagan, Joe Perches

Michael Kelley <mikelley@microsoft.com> writes:

> From: Vitaly Kuznetsov <vkuznets@redhat.com>
>> 
>> When sending an IPI to a single CPU there is no need to deal with cpumasks.
>> With 2 CPU guest on WS2019 I'm seeing a minor (like 3%, 8043 -> 7761 CPU
>> cycles) improvement with smp_call_function_single() loop benchmark. The
>> optimization, however, is tiny and straitforward. Also, send_ipi_one() is
>> important for PV spinlock kick.
>> 
>> I was also wondering if it would make sense to switch to using regular
>> APIC IPI send for CPU > 64 case but no, it is twice as expesive (12650 CPU
>> cycles for __send_ipi_mask_ex() call, 26000 for orig_apic.send_IPI(cpu,
>> vector)).
>> 
>> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>> ---
>> Changes since v1:
>>  - Style changes [Roman, Joe]
>> ---
>>  arch/x86/hyperv/hv_apic.c           | 13 ++++++++++---
>>  arch/x86/include/asm/trace/hyperv.h | 15 +++++++++++++++
>>  2 files changed, 25 insertions(+), 3 deletions(-)
>> 
>> diff --git a/arch/x86/hyperv/hv_apic.c b/arch/x86/hyperv/hv_apic.c
>> index e01078e93dd3..fd17c6341737 100644
>> --- a/arch/x86/hyperv/hv_apic.c
>> +++ b/arch/x86/hyperv/hv_apic.c
>> @@ -194,10 +194,17 @@ static bool __send_ipi_mask(const struct cpumask *mask, int
>> vector)
>> 
>>  static bool __send_ipi_one(int cpu, int vector)
>>  {
>> -	struct cpumask mask = CPU_MASK_NONE;
>> +	trace_hyperv_send_ipi_one(cpu, vector);
>> 
>> -	cpumask_set_cpu(cpu, &mask);
>> -	return __send_ipi_mask(&mask, vector);
>> +	if (!hv_hypercall_pg || (vector < HV_IPI_LOW_VECTOR) ||
>> +	    (vector > HV_IPI_HIGH_VECTOR))
>> +		return false;
>> +
>> +	if (cpu >= 64)
>> +		return __send_ipi_mask_ex(cpumask_of(cpu), vector);
>
> The above test should be checking the VP number, not the CPU
> number,

Oops, of course, thanks for catching this! v3 is coming!

>  since the VP number is used to form the bitmap argument
> to the hypercall.  In all current implementations of Hyper-V, the CPU number
> and VP number are the same as far as I am aware, but that's not guaranteed in 
> the future.
>
> Michael
>
>> +
>> +	return !hv_do_fast_hypercall16(HVCALL_SEND_IPI, vector,
>> +			       BIT_ULL(hv_cpu_number_to_vp_number(cpu)));
>>  }
>> 

-- 
Vitaly

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, back to index

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-25 13:15 [PATCH v2] x86/hyper-v: micro-optimize send_ipi_one case Vitaly Kuznetsov
2019-10-25 14:05 ` Roman Kagan
2019-10-25 17:16 ` Michael Kelley
2019-10-25 17:26   ` Vitaly Kuznetsov

Linux-HyperV Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-hyperv/0 linux-hyperv/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-hyperv linux-hyperv/ https://lore.kernel.org/linux-hyperv \
		linux-hyperv@vger.kernel.org
	public-inbox-index linux-hyperv

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-hyperv


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git