From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_PASS,USER_AGENT_NEOMUTT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2945BC43387 for ; Thu, 10 Jan 2019 20:32:18 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 00E7F20879 for ; Thu, 10 Jan 2019 20:32:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730349AbfAJUcR (ORCPT ); Thu, 10 Jan 2019 15:32:17 -0500 Received: from mx1.redhat.com ([209.132.183.28]:58278 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728847AbfAJUcQ (ORCPT ); Thu, 10 Jan 2019 15:32:16 -0500 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 5AF30DC901; Thu, 10 Jan 2019 20:32:15 +0000 (UTC) Received: from treble (ovpn-125-32.rdu2.redhat.com [10.10.125.32]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 1FB4B1054FAE; Thu, 10 Jan 2019 20:32:09 +0000 (UTC) Date: Thu, 10 Jan 2019 14:32:07 -0600 From: Josh Poimboeuf To: Nadav Amit Cc: X86 ML , LKML , Ard Biesheuvel , Andy Lutomirski , Steven Rostedt , Peter Zijlstra , Ingo Molnar , Thomas Gleixner , Linus Torvalds , Masami Hiramatsu , Jason Baron , Jiri Kosina , David Laight , Borislav Petkov , Julia Cartwright , Jessica Yu , "H. Peter Anvin" , Rasmus Villemoes , Edward Cree , Daniel Bristot de Oliveira Subject: Re: [PATCH v3 0/6] Static calls Message-ID: <20190110203207.3k43gt4kcvry7us7@treble> References: <20190110164401.g747vifrppbhbo3o@treble> <20190110181807.irh2b7fk6at43rdl@treble> <3F89FB6B-DA8B-4C71-B825-2B7EB86F274E@vmware.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <3F89FB6B-DA8B-4C71-B825-2B7EB86F274E@vmware.com> User-Agent: NeoMutt/20180716 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.38]); Thu, 10 Jan 2019 20:32:16 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jan 10, 2019 at 07:45:26PM +0000, Nadav Amit wrote: > >> I’m not GCC expert either and writing this code was not making me full of > >> joy, etc.. I’ll be happy that my code would be reviewed, but it does work. I > >> don’t think an early pass is needed, as long as hardware registers were not > >> allocated. > >> > >>> Would it work with more than 5 arguments, where args get passed on the > >>> stack? > >> > >> It does. > >> > >>> At the very least, it would (at least partially) defeat the point of the > >>> callee-saved paravirt ops. > >> > >> Actually, I think you can even deal with callee-saved functions and remove > >> all the (terrible) macros. You would need to tell the extension not to > >> clobber the registers through a new attribute. > > > > Ok, it does sound interesting then. I assume you'll be sharing the > > code? > > Of course. If this what is going to convince, I’ll make a small version for > PV callee-saved first. It wasn't *only* the PV callee-saved part which interested me, so if you already have something which implements the other parts, I'd still like to see it. > >>> What if we just used a plugin in a simpler fashion -- to do call site > >>> alignment, if necessary, to ensure the instruction doesn't cross > >>> cacheline boundaries. This could be done in a later pass, with no side > >>> effects other than code layout. And it would allow us to avoid > >>> breakpoints altogether -- again, assuming somebody can verify that > >>> intra-cacheline call destination writes are atomic with respect to > >>> instruction decoder reads. > >> > >> The plugin should not be able to do so. Layout of the bytecode is done by > >> the assembler, so I don’t think a plugin would help you with this one. > > > > Actually I think we could use .bundle_align_mode for this purpose: > > > > https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsourceware.org%2Fbinutils%2Fdocs-2.31%2Fas%2FBundle-directives.html&data=02%7C01%7Cnamit%40vmware.com%7Cfa29fb8be208498d039008d67727fe30%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636827411004664549&sdata=elDuAVOsSlidG7pZSZfjbhrgnMOHeX6AWKs0hJM4cCE%3D&reserved=0 > > Hm… I don’t understand what you have in mind (i.e., when would this > assembly directives would be emitted). For example, it could replace callq ____static_call_tramp_my_key with .bundle_align_mode 6 callq ____static_call_tramp_my_key .bundle_align_mode 0 which ensures the instruction is within a cache line, aligning it with NOPs if necessary. That would allow my current implementation to upgrade out-of-line calls to inline calls 100% of the time, instead of 95% of the time. -- Josh