From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.7 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AC79ECA9EBD for ; Fri, 25 Oct 2019 10:44:13 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 83DB42070B for ; Fri, 25 Oct 2019 10:44:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2409657AbfJYKoM (ORCPT ); Fri, 25 Oct 2019 06:44:12 -0400 Received: from mx1.redhat.com ([209.132.183.28]:38664 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2407973AbfJYKoL (ORCPT ); Fri, 25 Oct 2019 06:44:11 -0400 Received: from mail-wm1-f69.google.com (mail-wm1-f69.google.com [209.85.128.69]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 2D4B181F01 for ; Fri, 25 Oct 2019 10:44:11 +0000 (UTC) Received: by mail-wm1-f69.google.com with SMTP id m16so710684wmg.8 for ; Fri, 25 Oct 2019 03:44:11 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:in-reply-to:references:date :message-id:mime-version; bh=xLtoZSUJtLXfr2GPGv3stYcfCQVAfI7dC3ODzlq3CwA=; b=ctPFeHg2+vaY7ssChWmFaHsZhV0R2F52DosApWOIlyIiiegXHXCXNy3yOQVAa92yfI Yz75DGu2Uh3UTahOBR6DkWyJZsmNU5MvJzT33W+T0yMrzmbLlIyuXj6cglsQ6QAX0ZIs AokeRZt0Y/wfC/zf8YjTaw5YGs+93sKt+vxrVRCDnuh4bo1vJQ5t9g5HdPtin9jJKcem 6KQPMY9U3aWdWXHuAYrkON3Osk/4aooSZLes1gwS2w4xk6xVItcHZOqkVud+cZJjiSZ6 byTmIutiLdv/u9pG4EishHZQSnM/Eovyg8RCW6+005Sidbt5uXyX9h0yPLZK+RzAxf7c QY0g== X-Gm-Message-State: APjAAAW2FBegbfHHX9v8UX4I+C4eGErLCXrA3oHtp5NBAwEVHaSdlBoI 1XkKx1CMrrD5OXl2CUZEs08wA+w1rueFiT+WFwc0lbUGT4PAa3gyTH/dooPnrSyjGnovz1zwhjZ 0J5tMKWJL2oejHZhJMJbVwsB6 X-Received: by 2002:a1c:6405:: with SMTP id y5mr3080024wmb.175.1572000249772; Fri, 25 Oct 2019 03:44:09 -0700 (PDT) X-Google-Smtp-Source: APXvYqxsRUknjwKaf4n/Q/NFMTG8SODIFMIf015chHWuD9N5BsDqxI54rtI51KxXOfmREWqKoD6dAQ== X-Received: by 2002:a1c:6405:: with SMTP id y5mr3079992wmb.175.1572000249403; Fri, 25 Oct 2019 03:44:09 -0700 (PDT) Received: from vitty.brq.redhat.com (nat-pool-brq-t.redhat.com. [213.175.37.10]) by smtp.gmail.com with ESMTPSA id g5sm1882252wma.43.2019.10.25.03.44.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 25 Oct 2019 03:44:08 -0700 (PDT) From: Vitaly Kuznetsov To: Roman Kagan Cc: "linux-hyperv\@vger.kernel.org" , "linux-kernel\@vger.kernel.org" , "x86\@kernel.org" , "K. Y. Srinivasan" , Haiyang Zhang , Stephen Hemminger , Sasha Levin , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H. Peter Anvin" , Michael Kelley Subject: Re: [PATCH] x86/hyper-v: micro-optimize send_ipi_one case In-Reply-To: <20191024163204.GA4673@rkaganb.sw.ru> References: <20191024152152.25577-1-vkuznets@redhat.com> <20191024163204.GA4673@rkaganb.sw.ru> Date: Fri, 25 Oct 2019 12:44:07 +0200 Message-ID: <87r231xfyg.fsf@vitty.brq.redhat.com> MIME-Version: 1.0 Content-Type: text/plain Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Roman Kagan writes: > On Thu, Oct 24, 2019 at 05:21:52PM +0200, Vitaly Kuznetsov wrote: >> When sending an IPI to a single CPU there is no need to deal with cpumasks. >> With 2 CPU guest on WS2019 I'm seeing a minor (like 3%, 8043 -> 7761 CPU >> cycles) improvement with smp_call_function_single() loop benchmark. The >> optimization, however, is tiny and straitforward. Also, send_ipi_one() is >> important for PV spinlock kick. >> >> I was also wondering if it would make sense to switch to using regular >> APIC IPI send for CPU > 64 case but no, it is twice as expesive (12650 CPU >> cycles for __send_ipi_mask_ex() call, 26000 for orig_apic.send_IPI(cpu, >> vector)). > > Is it with APICv or emulated apic? That's actually a good question. Yesterday I was testing this on WS2019 host with Xeon e5-2420 v2 (Ivy Bridge EN) which I *think* should already support APICv - but I'm not sure and ark.intel.com is not helpful. Today, I decided to re-test on something more modern and I got WS2016 host with E5-2667 v4 (Broadwell) and the results are: 'Ex' hypercall: 18000 cycles orig_apic.send_IPI(): 46000 cycles I'm, however, just assuming that Hyper-V uses APICv when it's available and have no idea how to check from within the guest. I'm also not sure if WS2019 is so much faster or if there are other differences on these hosts which matter. > >> Signed-off-by: Vitaly Kuznetsov >> --- >> arch/x86/hyperv/hv_apic.c | 22 +++++++++++++++++++--- >> arch/x86/include/asm/trace/hyperv.h | 15 +++++++++++++++ >> 2 files changed, 34 insertions(+), 3 deletions(-) >> >> diff --git a/arch/x86/hyperv/hv_apic.c b/arch/x86/hyperv/hv_apic.c >> index e01078e93dd3..847f9d0328fe 100644 >> --- a/arch/x86/hyperv/hv_apic.c >> +++ b/arch/x86/hyperv/hv_apic.c >> @@ -194,10 +194,26 @@ static bool __send_ipi_mask(const struct cpumask *mask, int vector) >> >> static bool __send_ipi_one(int cpu, int vector) >> { >> - struct cpumask mask = CPU_MASK_NONE; >> + int ret; >> >> - cpumask_set_cpu(cpu, &mask); >> - return __send_ipi_mask(&mask, vector); >> + trace_hyperv_send_ipi_one(cpu, vector); >> + >> + if (unlikely(!hv_hypercall_pg)) >> + return false; >> + >> + if (unlikely((vector < HV_IPI_LOW_VECTOR) || >> + (vector > HV_IPI_HIGH_VECTOR))) >> + return false; > > I guess 'ulikely' is unnecessary in these cases. > All I can say is that the resulting asm with my gcc is a bit different :-) >> + >> + if (cpu >= 64) >> + goto do_ex_hypercall; >> + >> + ret = hv_do_fast_hypercall16(HVCALL_SEND_IPI, vector, >> + BIT_ULL(hv_cpu_number_to_vp_number(cpu))); >> + return ((ret == 0) ? true : false); > > D'oh. Isn't "return ret == 0;" or just "return ret;" good enough? That's how we do stuff in __send_ipi_mask() :-) I'll send v2 implementing Joe's suggestion to drop 'ret' and just do return !hv_do_fast_hypercall16(). > > These tiny nitpicks are no reason to hold the patch though, so > > Reviewed-by: Roman Kagan Thanks! -- Vitaly