From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Cooper Subject: Re: [PATCH] x86/current: Provide additional information to optimise get_cpu_info() Date: Mon, 1 Sep 2014 13:18:39 +0100 Message-ID: <5404641F.4060705@citrix.com> References: <1409569130-19066-1-git-send-email-andrew.cooper3@citrix.com> <540473A7020000780002F6C2@mail.emea.novell.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <540473A7020000780002F6C2@mail.emea.novell.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Jan Beulich Cc: Xen-devel List-Id: xen-devel@lists.xenproject.org On 01/09/14 12:24, Jan Beulich wrote: >>>> On 01.09.14 at 12:58, wrote: >> Exactly as with c/s d55c5eefe "x86: use compiler visible "add" instead of >> inline assembly "or" in get_cpu_info()", this is achieved by providing more >> information to the compiler. >> >> With this modification, gcc replaces the older: >> mov imm, %reg >> and %rsp, %reg >> >> with: >> mov %rsp, %reg >> and imm, %reg >> >> which is one byte shorter. > I'm in no way opposed to the change, but is that really true? Afaict > it can be 1 byte shorter only when %rax gets selected as the register > here. Oh - quite possibly only %rax, but that still makes up the majority of instances in shorter functions, where %rax was previously chosen as well. I also note that the exact position of the lookup gets deferred in some cases until after an early exit from the function. > >> It also considers all general purpose registers >> for %reg rather than just the legacy ones (i.e. will now use %r12 etc), >> which >> allows for better register scheduling in larger functions. > Same here - why would with the old code not all registers be > available for selection by the compiler? I suspect it has something to do with the choices available from the asm parameter. There no mnemonics to specify the newer registers, which is a holdover from the 32bit days. I suspect there is some implicit limit to just the legacy GPRs. Either way, my observations of the change in generated asm is that before the change, no REX.R registers were used, whereas they are used afterwards. ~Andrew