From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6E1F8C433EF for ; Thu, 18 Nov 2021 00:32:47 +0000 (UTC) Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D788660041 for ; Thu, 18 Nov 2021 00:32:46 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org D788660041 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=srcf.net Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=lists.xenproject.org Received: from list by lists.xenproject.org with outflank-mailman.227048.392570 (Exim 4.92) (envelope-from ) id 1mnVLf-0002ap-7G; Thu, 18 Nov 2021 00:32:19 +0000 X-Outflank-Mailman: Message body and most headers restored to incoming version Received: by outflank-mailman (output) from mailman id 227048.392570; Thu, 18 Nov 2021 00:32:19 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1mnVLf-0002ai-3q; Thu, 18 Nov 2021 00:32:19 +0000 Received: by outflank-mailman (input) for mailman id 227048; Thu, 18 Nov 2021 00:32:17 +0000 Received: from se1-gles-sth1-in.inumbo.com ([159.253.27.254] helo=se1-gles-sth1.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1mnVLd-0002ac-4g for xen-devel@lists.xenproject.org; Thu, 18 Nov 2021 00:32:17 +0000 Received: from ppsw-43.csi.cam.ac.uk (ppsw-43.csi.cam.ac.uk [131.111.8.143]) by se1-gles-sth1.inumbo.com (Halon) with ESMTPS id fa6ac775-4806-11ec-9787-a32c541c8605; Thu, 18 Nov 2021 01:32:15 +0100 (CET) Received: from hades.srcf.societies.cam.ac.uk ([131.111.179.67]:37186) by ppsw-43.csi.cam.ac.uk (ppsw.cam.ac.uk [131.111.8.139]:25) with esmtps (TLS1.2:ECDHE-RSA-AES256-GCM-SHA384:256) id 1mnVLa-000IU2-oa (Exim 4.95) (return-path ); Thu, 18 Nov 2021 00:32:14 +0000 Received: from [192.168.1.10] (host-92-12-61-86.as13285.net [92.12.61.86]) (Authenticated sender: amc96) by hades.srcf.societies.cam.ac.uk (Postfix) with ESMTPSA id 682EC1FD7A; Thu, 18 Nov 2021 00:32:14 +0000 (GMT) X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" X-Inumbo-ID: fa6ac775-4806-11ec-9787-a32c541c8605 X-Cam-AntiVirus: no malware found X-Cam-ScannerInfo: https://help.uis.cam.ac.uk/email-scanner-virus Message-ID: <6935bdd8-6b4a-80f6-d134-768dc0d37abe@srcf.net> Date: Thu, 18 Nov 2021 00:32:14 +0000 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.3.0 Content-Language: en-GB To: Jan Beulich , Andrew Cooper Cc: =?UTF-8?Q?Roger_Pau_Monn=c3=a9?= , Wei Liu , Xen-devel References: <20211111175740.23480-1-andrew.cooper3@citrix.com> <20211111175740.23480-6-andrew.cooper3@citrix.com> From: Andrew Cooper Subject: Re: [PATCH 5/5] x86/ioapic: Drop function pointers from __ioapic_{read,write}_entry() In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit On 12/11/2021 10:43, Jan Beulich wrote: > On 11.11.2021 18:57, Andrew Cooper wrote: >> Function pointers are expensive, and the raw parameter is a constant from all >> callers, meaning that it predicts very well with local branch history. > The code change is fine, but I'm having trouble with "all" here: Both > functions aren't even static, so while callers in io_apic.c may > benefit (perhaps with the exception of ioapic_{read,write}_entry(), > depending on whether the compiler views inlining them as warranted), > I'm in no way convinced this extends to the callers in VT-d code. > > Further ISTR clang being quite a bit less aggressive about inlining, > so the effects might not be quite as good there even for the call > sites in io_apic.c. > > Can you clarify this for me please? The way the compiler lays out the code is unrelated to why this form is an improvement. Branch history is a function of "the $N most recently taken branches".  This is because "how you got here" is typically relevant to "where you should go next". Trivial schemes maintain a shift register of taken / not-taken results.  Less trivial schemes maintain a rolling hash of (src addr, dst addr) tuples of all taken branches (direct and indirect).  In both cases, the instantaneous branch history is an input into the final prediction, and is commonly used to select which saturating counter (or bank of counters) is used. Consider something like while ( cond ) {     memcpy(dst1, src1, 64);     memcpy(dst2, src2, 7); } Here, the conditional jump inside memcpy() coping with the tail of the copy flips result 50% of the time, which is fiendish to predict for. However, because the branch history differs (by memcpy()'s return address which was accumulated by the call instruction), the predictor can actually use two different taken/not-taken counters for the two different "instances" if the tail jump.  After a few iterations to warm up, the predictor will get every jump perfect despite the fact that memcpy() is a library call and the branches would otherwise alias. Bringing it back to the code in question.  The "raw" parameter is an explicit true or false at the top of all call paths leading into these functions.  Therefore, an individual branch history has a high correlation with said true or false, irrespective of the absolute code layout.  As a consequence, the correct result of the prediction is highly correlated with the branch history, and it will predict perfectly[1] after a few times the path has been used. ~Andrew [1] Obviously, it's not actually perfect outside of a synthetic example.  Aliasing in the predictor is a necessary property of keeping the logic small enough to provide an answer fast, but the less accidental aliasing there is, the faster the CPU performance in benchmarks, so incentives are in our favour here.