From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.1 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E0AB0C48BD7 for ; Tue, 25 Jun 2019 20:15:34 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id B9845208CB for ; Tue, 25 Jun 2019 20:15:34 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1561493734; bh=Non/zwPybF2AGuhmjwFgxmMCpYW8ogfnhbNrhDKk/vc=; h=References:In-Reply-To:From:Date:Subject:To:Cc:List-ID:From; b=SagX5FfuMzD1iy+e8PBld8hek6tleTrIA2vgze48tZILMFkziS9d8G+iWaxQG38AD Hs4RjtsxKrjpzQBd/LcVMykaW/pU8tZxKJzKfkIFaIFyBvNVG6yfg2cK0JuMs/xXMh DNh9bYh8/WklCNvkf0dObuetTru2o3RoVlC6YFzc= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727022AbfFYUPd (ORCPT ); Tue, 25 Jun 2019 16:15:33 -0400 Received: from mail.kernel.org ([198.145.29.99]:44066 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726628AbfFYUPd (ORCPT ); Tue, 25 Jun 2019 16:15:33 -0400 Received: from mail-wr1-f47.google.com (mail-wr1-f47.google.com [209.85.221.47]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 208DF216E3 for ; Tue, 25 Jun 2019 20:15:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1561493732; bh=Non/zwPybF2AGuhmjwFgxmMCpYW8ogfnhbNrhDKk/vc=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=lLPnssPRCinSZhMedXFl4IX9MsXTNhVSxFwL4KFntKi0pXjqZtp295xgMzrg8LGbO mOUGpRdpbB/L63NxLJu57zUIKU8V40nkRaOVFZ6w0Dr1RtDyGNFekX200zDZun+Q1n hwMQ4fQRt/bJqYXOD6mBo2LKulD7skPGobAZzDmU= Received: by mail-wr1-f47.google.com with SMTP id n4so73090wrs.3 for ; Tue, 25 Jun 2019 13:15:32 -0700 (PDT) X-Gm-Message-State: APjAAAWYLeOVhVybsUR2qojRV1378sjHfofXns6VNbKUYuLERvBKslhY JwfOQtQkz6Zl0O+fTSeNkjSckFiLA1Au2Vq8VxtgIg== X-Google-Smtp-Source: APXvYqyhHtmITJg1+Pl3hirpl0QwbJ++nvN6wGzhM8y02SUtgxU8TIY6h1kuPsgr5idu476pi12gfJv0CJukh1rt4Oc= X-Received: by 2002:adf:cc85:: with SMTP id p5mr17560wrj.47.1561493730555; Tue, 25 Jun 2019 13:15:30 -0700 (PDT) MIME-Version: 1.0 References: <20190624133607.GI29497@fuggles.cambridge.arm.com> <20190625161804.38713-1-vincenzo.frascino@arm.com> In-Reply-To: From: Andy Lutomirski Date: Tue, 25 Jun 2019 13:15:19 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH 1/3] lib/vdso: Delay mask application in do_hres() To: Thomas Gleixner Cc: Vincenzo Frascino , linux-arch , LAK , LKML , linux-mips@vger.kernel.org, "open list:KERNEL SELFTEST FRAMEWORK" , Catalin Marinas , Will Deacon , Arnd Bergmann , Russell King , Ralf Baechle , Paul Burton , Daniel Lezcano , Mark Salyzyn , Peter Collingbourne , Shuah Khan , Dmitry Safonov <0x7f454c46@gmail.com>, Rasmus Villemoes , Huw Davies , Shijith Thotton , Andre Przywara , Andy Lutomirski , Peter Zijlstra Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jun 25, 2019 at 11:27 AM Thomas Gleixner wrote: > > On Tue, 25 Jun 2019, Thomas Gleixner wrote: > > > On Tue, 25 Jun 2019, Vincenzo Frascino wrote: > > > > CC+ Andy > > > > > do_hres() in the vDSO generic library masks the hw counter value > > > immediately after reading it. > > > > > > Postpone the mask application after checking if the syscall fallback is > > > enabled, in order to be able to detect a possible fallback for the > > > architectures that have masks smaller than ULLONG_MAX. > > > > Right. This only worked on x86 because the mask is there ULLONG_MAX for all > > VDSO capable clocksources, i.e. that ever worked just by chance. > > > > As we talked about that already yesterday, I tested this on a couple of > > machines and as expected the outcome is uarch dependent. Minimal deviations > > to both sides and some machines do not show any change at all. I doubt it's > > possible to come up with a solution which makes all uarchs go faster > > magically. > > > > Though, thinking about it, we could remove the mask operation completely on > > X86. /me runs tests > > Unsurprisingly the results vary. Two uarchs do not care, but they did not > care about moving the mask either. The other two gain performance and the > last one falls back to the state before moving the mask. So in general it > looks like a worthwhile optimization. > At one point, I contemplated a different approach: have the "get the counter" routine return 0 and then do if (unlikely(cycles <= last)) goto fallback. This will remove one branch from the hot path. I got dubious results when I tried benchmarking it, probably because the branch in question was always correctly predicted. From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andy Lutomirski Subject: Re: [PATCH 1/3] lib/vdso: Delay mask application in do_hres() Date: Tue, 25 Jun 2019 13:15:19 -0700 Message-ID: References: <20190624133607.GI29497@fuggles.cambridge.arm.com> <20190625161804.38713-1-vincenzo.frascino@arm.com> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Return-path: In-Reply-To: Sender: linux-kernel-owner@vger.kernel.org To: Thomas Gleixner Cc: Vincenzo Frascino , linux-arch , LAK , LKML , linux-mips@vger.kernel.org, "open list:KERNEL SELFTEST FRAMEWORK" , Catalin Marinas , Will Deacon , Arnd Bergmann , Russell King , Ralf Baechle , Paul Burton , Daniel Lezcano , Mark Salyzyn , Peter Collingbourne , Shuah Khan , Dmitry Safonov <0x7f454c46@gmail.com>, Rasmus Villemoes , Huw Davies List-Id: linux-arch.vger.kernel.org On Tue, Jun 25, 2019 at 11:27 AM Thomas Gleixner wrote: > > On Tue, 25 Jun 2019, Thomas Gleixner wrote: > > > On Tue, 25 Jun 2019, Vincenzo Frascino wrote: > > > > CC+ Andy > > > > > do_hres() in the vDSO generic library masks the hw counter value > > > immediately after reading it. > > > > > > Postpone the mask application after checking if the syscall fallback is > > > enabled, in order to be able to detect a possible fallback for the > > > architectures that have masks smaller than ULLONG_MAX. > > > > Right. This only worked on x86 because the mask is there ULLONG_MAX for all > > VDSO capable clocksources, i.e. that ever worked just by chance. > > > > As we talked about that already yesterday, I tested this on a couple of > > machines and as expected the outcome is uarch dependent. Minimal deviations > > to both sides and some machines do not show any change at all. I doubt it's > > possible to come up with a solution which makes all uarchs go faster > > magically. > > > > Though, thinking about it, we could remove the mask operation completely on > > X86. /me runs tests > > Unsurprisingly the results vary. Two uarchs do not care, but they did not > care about moving the mask either. The other two gain performance and the > last one falls back to the state before moving the mask. So in general it > looks like a worthwhile optimization. > At one point, I contemplated a different approach: have the "get the counter" routine return 0 and then do if (unlikely(cycles <= last)) goto fallback. This will remove one branch from the hot path. I got dubious results when I tried benchmarking it, probably because the branch in question was always correctly predicted. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail.kernel.org ([198.145.29.99]:44072 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726653AbfFYUPd (ORCPT ); Tue, 25 Jun 2019 16:15:33 -0400 Received: from mail-wr1-f45.google.com (mail-wr1-f45.google.com [209.85.221.45]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 2C41F21738 for ; Tue, 25 Jun 2019 20:15:32 +0000 (UTC) Received: by mail-wr1-f45.google.com with SMTP id d18so61116wrs.5 for ; Tue, 25 Jun 2019 13:15:32 -0700 (PDT) MIME-Version: 1.0 References: <20190624133607.GI29497@fuggles.cambridge.arm.com> <20190625161804.38713-1-vincenzo.frascino@arm.com> In-Reply-To: From: Andy Lutomirski Date: Tue, 25 Jun 2019 13:15:19 -0700 Message-ID: Subject: Re: [PATCH 1/3] lib/vdso: Delay mask application in do_hres() Content-Type: text/plain; charset="UTF-8" Sender: linux-arch-owner@vger.kernel.org List-ID: To: Thomas Gleixner Cc: Vincenzo Frascino , linux-arch , LAK , LKML , linux-mips@vger.kernel.org, "open list:KERNEL SELFTEST FRAMEWORK" , Catalin Marinas , Will Deacon , Arnd Bergmann , Russell King , Ralf Baechle , Paul Burton , Daniel Lezcano , Mark Salyzyn , Peter Collingbourne , Shuah Khan , Dmitry Safonov <0x7f454c46@gmail.com>, Rasmus Villemoes , Huw Davies , Shijith Thotton , Andre Przywara , Andy Lutomirski , Peter Zijlstra Message-ID: <20190625201519.USKE6UFXrid1X2WQeq7UUUIRYLI1W2UdqO3_yr8Jkis@z> On Tue, Jun 25, 2019 at 11:27 AM Thomas Gleixner wrote: > > On Tue, 25 Jun 2019, Thomas Gleixner wrote: > > > On Tue, 25 Jun 2019, Vincenzo Frascino wrote: > > > > CC+ Andy > > > > > do_hres() in the vDSO generic library masks the hw counter value > > > immediately after reading it. > > > > > > Postpone the mask application after checking if the syscall fallback is > > > enabled, in order to be able to detect a possible fallback for the > > > architectures that have masks smaller than ULLONG_MAX. > > > > Right. This only worked on x86 because the mask is there ULLONG_MAX for all > > VDSO capable clocksources, i.e. that ever worked just by chance. > > > > As we talked about that already yesterday, I tested this on a couple of > > machines and as expected the outcome is uarch dependent. Minimal deviations > > to both sides and some machines do not show any change at all. I doubt it's > > possible to come up with a solution which makes all uarchs go faster > > magically. > > > > Though, thinking about it, we could remove the mask operation completely on > > X86. /me runs tests > > Unsurprisingly the results vary. Two uarchs do not care, but they did not > care about moving the mask either. The other two gain performance and the > last one falls back to the state before moving the mask. So in general it > looks like a worthwhile optimization. > At one point, I contemplated a different approach: have the "get the counter" routine return 0 and then do if (unlikely(cycles <= last)) goto fallback. This will remove one branch from the hot path. I got dubious results when I tried benchmarking it, probably because the branch in question was always correctly predicted. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.0 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6A1BDC48BD6 for ; Tue, 25 Jun 2019 20:15:42 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 3D0D42086D for ; Tue, 25 Jun 2019 20:15:42 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="MCsrKwhz"; dkim=fail reason="signature verification failed" (1024-bit key) header.d=kernel.org header.i=@kernel.org header.b="lLPnssPR" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3D0D42086D Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+infradead-linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20170209; h=Sender: Content-Transfer-Encoding:Content-Type:Cc:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:To:Subject:Message-ID:Date:From: In-Reply-To:References:MIME-Version:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=1UBH6YGxLpgm+OdHsYAZ7seVIoyRnSXJ+TRV6AgZCnY=; b=MCsrKwhzXNGIeX eX6mpX9mzcbSS+hzMQ+PFEKQ2b7YIgsdjxDftGCBmc5a0EwmUOFvoEFi463PIb1KeuM9oJ+GoohO7 kFwLXmRo2VMv1ZxlWjqBHQ2PKn+RJb8Wg0ZR7PtZ1nDgzI+XcIVJKBELmmROC0VrYgZhJA18KxFAj Nangky0LRfcsfK5O+DYVaqx4Ri04mRPv73hg1tfMWNXaWa9COftyVX5ZClMGe0uRtgj06ZqAFYSdq EiTdi9n2/D2V96ZeN5Awe5oDiXLD+y+QrcXnpHCCJZPf3aGkgoISuOmPmNE8Bu3FLb24wVCWBpG82 391UMsZ6YqlhmKotYAcg==; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.92 #3 (Red Hat Linux)) id 1hfrqq-0006Yq-CR; Tue, 25 Jun 2019 20:15:36 +0000 Received: from mail.kernel.org ([198.145.29.99]) by bombadil.infradead.org with esmtps (Exim 4.92 #3 (Red Hat Linux)) id 1hfrqo-0006YO-7b for linux-arm-kernel@lists.infradead.org; Tue, 25 Jun 2019 20:15:35 +0000 Received: from mail-wr1-f42.google.com (mail-wr1-f42.google.com [209.85.221.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 035A12146E for ; Tue, 25 Jun 2019 20:15:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1561493732; bh=Non/zwPybF2AGuhmjwFgxmMCpYW8ogfnhbNrhDKk/vc=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=lLPnssPRCinSZhMedXFl4IX9MsXTNhVSxFwL4KFntKi0pXjqZtp295xgMzrg8LGbO mOUGpRdpbB/L63NxLJu57zUIKU8V40nkRaOVFZ6w0Dr1RtDyGNFekX200zDZun+Q1n hwMQ4fQRt/bJqYXOD6mBo2LKulD7skPGobAZzDmU= Received: by mail-wr1-f42.google.com with SMTP id x17so39164wrl.9 for ; Tue, 25 Jun 2019 13:15:31 -0700 (PDT) X-Gm-Message-State: APjAAAVYi70KYyDXXTh4fwl5/paMc4XB1MwgxmnHXSy4oKqPv+TBcn8x i87Vk/EAFrLqLwgkb+n2LIxehKSpE+jG5xa3ubGTRQ== X-Google-Smtp-Source: APXvYqyhHtmITJg1+Pl3hirpl0QwbJ++nvN6wGzhM8y02SUtgxU8TIY6h1kuPsgr5idu476pi12gfJv0CJukh1rt4Oc= X-Received: by 2002:adf:cc85:: with SMTP id p5mr17560wrj.47.1561493730555; Tue, 25 Jun 2019 13:15:30 -0700 (PDT) MIME-Version: 1.0 References: <20190624133607.GI29497@fuggles.cambridge.arm.com> <20190625161804.38713-1-vincenzo.frascino@arm.com> In-Reply-To: From: Andy Lutomirski Date: Tue, 25 Jun 2019 13:15:19 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH 1/3] lib/vdso: Delay mask application in do_hres() To: Thomas Gleixner X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20190625_131534_314149_C644C08B X-CRM114-Status: GOOD ( 19.88 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Rasmus Villemoes , Peter Zijlstra , Catalin Marinas , Will Deacon , linux-mips@vger.kernel.org, "open list:KERNEL SELFTEST FRAMEWORK" , Vincenzo Frascino , Shuah Khan , linux-arch , Dmitry Safonov <0x7f454c46@gmail.com>, Daniel Lezcano , Russell King , Arnd Bergmann , Andre Przywara , Andy Lutomirski , Peter Collingbourne , LAK , Huw Davies , LKML , Ralf Baechle , Mark Salyzyn , Paul Burton , Shijith Thotton Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+infradead-linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Tue, Jun 25, 2019 at 11:27 AM Thomas Gleixner wrote: > > On Tue, 25 Jun 2019, Thomas Gleixner wrote: > > > On Tue, 25 Jun 2019, Vincenzo Frascino wrote: > > > > CC+ Andy > > > > > do_hres() in the vDSO generic library masks the hw counter value > > > immediately after reading it. > > > > > > Postpone the mask application after checking if the syscall fallback is > > > enabled, in order to be able to detect a possible fallback for the > > > architectures that have masks smaller than ULLONG_MAX. > > > > Right. This only worked on x86 because the mask is there ULLONG_MAX for all > > VDSO capable clocksources, i.e. that ever worked just by chance. > > > > As we talked about that already yesterday, I tested this on a couple of > > machines and as expected the outcome is uarch dependent. Minimal deviations > > to both sides and some machines do not show any change at all. I doubt it's > > possible to come up with a solution which makes all uarchs go faster > > magically. > > > > Though, thinking about it, we could remove the mask operation completely on > > X86. /me runs tests > > Unsurprisingly the results vary. Two uarchs do not care, but they did not > care about moving the mask either. The other two gain performance and the > last one falls back to the state before moving the mask. So in general it > looks like a worthwhile optimization. > At one point, I contemplated a different approach: have the "get the counter" routine return 0 and then do if (unlikely(cycles <= last)) goto fallback. This will remove one branch from the hot path. I got dubious results when I tried benchmarking it, probably because the branch in question was always correctly predicted. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel