From: "Radim Krčmář" <rkrcmar@redhat.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
Nadav Amit <namit@cs.technion.ac.il>,
Gleb Natapov <gleb@kernel.org>
Subject: Re: [PATCH 8/8] KVM: x86: simplify kvm_apic_map
Date: Fri, 30 Jan 2015 17:57:47 +0100 [thread overview]
Message-ID: <20150130165747.GD27414@potion.redhat.com> (raw)
In-Reply-To: <54CBA1F9.6070206@redhat.com>
2015-01-30 16:23+0100, Paolo Bonzini:
> On 30/01/2015 16:14, Radim Krčmář wrote:
> > > > + case KVM_APIC_MODE_XAPIC_FLAT:
> > > > + *cid = 0;
> > > > + *lid = ldr & 0xff;
> > > > + return true;
> > > > + case KVM_APIC_MODE_XAPIC_CLUSTER:
> > > > + *cid = (ldr >> 4) & 0xf;
> > > > + *lid = ldr & 0xf;
> > > > + return true;
> > > > + case KVM_APIC_MODE_X2APIC:
> > > > + *cid = ldr >> 16;
> > > > + *lid = ldr & 0xffff;
> > > > + return true;
> > > > + }
> >
> >> > lid_bits = mode;
> >> > cid_bits = mode & (16 | 4);
> >> > lid_mask = (1 << lid_bits) - 1;
> >> > cid_mask = (1 << cid_bits) - 1;
> >> >
> >> > *cid = (ldr >> lid_bits) & cid_mask;
> >> > *lid = ldr & lid_mask;
> > Would jump predictor fail on the switch? Or is size of the code that
> > important? This code is shorter, but is going to execute far more
> > operations, so I think it would be slower ... (And harder to read.)
>
> Considering the additional comparisons for the switch, I don't think
> it's going to execute far more operations...
(A quick assembly comparison [1].)
As optimizations go, we could drop the "&" on cid as "cid < 16" is
checked later, so mode=4 practically does nothing ... Not the best for
future bugs, but still pretty safe -- only x2APIC can set a value that
would require the "&" and it can't have valid XAPIC mode, so u32 still
protects us enough.
*cid = ldr >> mode;
*lid = ldr & ((1 << mode) - 1);
Which is probably faster a solution where we don't shuffle switch cases
to jump to x2APIC first. A comparison is at the very bottom [2].
Would that be ok in v2?
---
1: To make it easier, it wasn't inlined, sorry.
There are four parts, first one compares the whole functions and
three subsequent compare each switch case, with common head/tail
omitted. (We don't care about performance when the map is invalid.)
The optimized version is always first and next to it is the old one.
0000000000001700 <apic_logical_id>: (ALI from now on)
1700: callq 1705 <ALI+0x5> 1700: callq 1705 <ALI+0x5>
1705: push %rbp 1705: movzbl 0x10(%rdi),%eax
1706: movzbl 0x10(%rdi),%r8d 1709: push %rbp
170b: mov %rcx,%r10 170a: mov %rsp,%rbp
170e: xor %eax,%eax 170d: cmp $0x8,%al
1710: mov %rsp,%rbp 170f: je 1758 <ALI+0x58>
1713: lea -0x1(%r8),%ecx 1711: cmp $0x10,%al
1717: test %r8d,%ecx 1713: je 1740 <ALI+0x40>
171a: jne 1752 <ALI+0x52> 1715: cmp $0x4,%al
171c: mov %r8d,%edi 1717: je 1720 <ALI+0x20>
171f: mov $0x1,%eax 1719: xor %eax,%eax
1724: mov %esi,%r8d 171b: pop %rbp
1727: mov %edi,%ecx 171c: retq
1729: mov %eax,%r9d 171d: nopl (%rax)
172c: shr %cl,%r8d 1720: mov %esi,%eax
172f: and $0x14,%ecx 1722: and $0xf,%esi
1732: shl %cl,%r9d 1725: shr $0x4,%eax
1735: mov %edi,%ecx 1728: and $0xf,%eax
1737: shl %cl,%eax 172b: mov %ax,(%rdx)
1739: sub $0x1,%r9d 172e: mov $0x1,%eax
173d: sub $0x1,%eax 1733: mov %si,(%rcx)
1740: and %r9d,%r8d 1736: pop %rbp
1743: and %esi,%eax 1737: retq
1745: mov %r8w,(%rdx) 1738: nopl 0x0(%rax,%rax,1)
1749: mov %ax,(%r10) 173f:
174d: mov $0x1,%eax 1740: mov %esi,%eax
1752: pop %rbp 1742: shr $0x10,%eax
1753: retq 1745: mov %ax,(%rdx)
1748: mov $0x1,%eax
174d: mov %si,(%rcx)
1750: pop %rbp
1751: retq
1752: nopw 0x0(%rax,%rax,1)
1758: xor %eax,%eax
175a: and $0xff,%si
175f: mov %ax,(%rdx)
1762: mov $0x1,%eax
1767: mov %si,(%rcx)
176a: pop %rbp
176b: retq
170b: mov %rcx,%r10 170a: mov %rsp,%rbp
170e: xor %eax,%eax 170d: cmp $0x8,%al
1710: mov %rsp,%rbp 170f: je 1758 <ALI+0x58>
1713: lea -0x1(%r8),%ecx [jump]
1717: test %r8d,%ecx 1758: xor %eax,%eax
171a: jne 1752 <ALI+0x52> 175a: and $0xff,%si
171c: mov %r8d,%edi 175f: mov %ax,(%rdx)
171f: mov $0x1,%eax 1762: mov $0x1,%eax
1724: mov %esi,%r8d 1767: mov %si,(%rcx)
1727: mov %edi,%ecx
1729: mov %eax,%r9d
172c: shr %cl,%r8d
172f: and $0x14,%ecx
1732: shl %cl,%r9d
1735: mov %edi,%ecx
1737: shl %cl,%eax
1739: sub $0x1,%r9d
173d: sub $0x1,%eax
1740: and %r9d,%r8d
1743: and %esi,%eax
1745: mov %r8w,(%rdx)
1749: mov %ax,(%r10)
174d: mov $0x1,%eax
170b: mov %rcx,%r10 170a: mov %rsp,%rbp
170e: xor %eax,%eax 170d: cmp $0x8,%al
1710: mov %rsp,%rbp 170f: je 1758 <ALI+0x58>
1713: lea -0x1(%r8),%ecx 1711: cmp $0x10,%al
1717: test %r8d,%ecx 1713: je 1740 <ALI+0x40>
171a: jne 1752 <ALI+0x52> [jump]
171c: mov %r8d,%edi 1740: mov %esi,%eax
171f: mov $0x1,%eax 1742: shr $0x10,%eax
1724: mov %esi,%r8d 1745: mov %ax,(%rdx)
1727: mov %edi,%ecx 1748: mov $0x1,%eax
1729: mov %eax,%r9d 174d: mov %si,(%rcx)
172c: shr %cl,%r8d
172f: and $0x14,%ecx
1732: shl %cl,%r9d
1735: mov %edi,%ecx
1737: shl %cl,%eax
1739: sub $0x1,%r9d
173d: sub $0x1,%eax
1740: and %r9d,%r8d
1743: and %esi,%eax
1745: mov %r8w,(%rdx)
1749: mov %ax,(%r10)
174d: mov $0x1,%eax
170b: mov %rcx,%r10 170a: mov %rsp,%rbp
170e: xor %eax,%eax 170d: cmp $0x8,%al
1710: mov %rsp,%rbp 170f: je 1758 <ALI+0x58>
1713: lea -0x1(%r8),%ecx 1711: cmp $0x10,%al
1717: test %r8d,%ecx 1713: je 1740 <ALI+0x40>
171a: jne 1752 <ALI+0x52> 1715: cmp $0x4,%al
171c: mov %r8d,%edi 1717: je 1720 <ALI+0x20>
171f: mov $0x1,%eax [jump]
1724: mov %esi,%r8d 1720: mov %esi,%eax
1727: mov %edi,%ecx 1722: and $0xf,%esi
1729: mov %eax,%r9d 1725: shr $0x4,%eax
172c: shr %cl,%r8d 1728: and $0xf,%eax
172f: and $0x14,%ecx 172b: mov %ax,(%rdx)
1732: shl %cl,%r9d 172e: mov $0x1,%eax
1735: mov %edi,%ecx 1733: mov %si,(%rcx)
1737: shl %cl,%eax
1739: sub $0x1,%r9d
173d: sub $0x1,%eax
1740: and %r9d,%r8d
1743: and %esi,%eax
1745: mov %r8w,(%rdx)
1749: mov %ax,(%r10)
174d: mov $0x1,%eax
2: A comparison of aggressively optimized ALI with the worst case
scenario, where x2APIC is the third jump.
16e5: mov %rcx,%r9 170a: mov %rsp,%rbp
16ed: xor %eax,%eax 170d: cmp $0x8,%al
16ef: mov %rsp,%rbp 170f: je 1758 <ALI+0x58>
16f2: lea -0x1(%rcx),%r8d 1711: cmp $0x10,%al
16f6: test %ecx,%r8d 1713: je 1740 <ALI+0x40>
16f9: jne 1717 <ALI+0x37> 1715: cmp $0x4,%al
16fb: mov %esi,%eax 1717: je 1720 <ALI+0x20>
16fd: shr %cl,%eax [jump]
16ff: mov %ax,(%rdx) 1720: mov %esi,%eax
1702: mov $0x1,%eax 1722: and $0xf,%esi
1707: shl %cl,%eax 1725: shr $0x4,%eax
1709: sub $0x1,%eax 1728: and $0xf,%eax
170c: and %esi,%eax 172b: mov %ax,(%rdx)
170e: mov %ax,(%r9) 172e: mov $0x1,%eax
1712: mov $0x1,%eax 1733: mov %si,(%rcx)
next prev parent reply other threads:[~2015-01-30 16:58 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-01-29 21:48 [PATCH 0/8] KVM: minor APIC fixes and cleanups Radim Krčmář
2015-01-29 21:48 ` [PATCH 1/8] KVM: x86: return bool from kvm_apic_match*() Radim Krčmář
2015-01-29 22:10 ` Joe Perches
2015-01-30 8:50 ` Paolo Bonzini
2015-01-29 21:48 ` [PATCH 2/8] KVM: x86: cleanup kvm_apic_match_*() Radim Krčmář
2015-01-30 8:52 ` Paolo Bonzini
2015-01-30 13:06 ` Radim Krčmář
2015-02-02 14:26 ` Radim Krčmář
2015-02-02 14:28 ` Paolo Bonzini
2015-02-02 14:30 ` Radim Krčmář
2015-02-02 14:29 ` Radim Krčmář
2015-01-29 21:48 ` [PATCH 3/8] KVM: x86: replace 0 with APIC_DEST_PHYSICAL Radim Krčmář
2015-01-29 21:48 ` [PATCH 4/8] KVM: x86: fix x2apic logical address matching Radim Krčmář
2015-01-29 21:48 ` [PATCH 5/8] KVM: x86: use MDA for interrupt matching Radim Krčmář
2015-01-30 9:03 ` Paolo Bonzini
2015-01-30 13:09 ` Radim Krčmář
2015-01-29 21:48 ` [PATCH 6/8] KVM: x86: allow mixed APIC mode broadcast Radim Krčmář
2015-01-29 21:48 ` [PATCH 7/8] KVM: x86: avoid logical_map when it is invalid Radim Krčmář
2015-01-30 9:19 ` Paolo Bonzini
2015-01-30 14:21 ` Radim Krčmář
2015-01-30 14:21 ` Paolo Bonzini
2015-01-30 9:38 ` Paolo Bonzini
2015-01-30 14:56 ` Radim Krčmář
2015-01-30 15:10 ` Paolo Bonzini
2015-01-30 17:09 ` Radim Krčmář
2015-01-29 21:48 ` [PATCH 8/8] KVM: x86: simplify kvm_apic_map Radim Krčmář
2015-01-30 9:18 ` Paolo Bonzini
2015-01-30 15:14 ` Radim Krčmář
2015-01-30 15:23 ` Paolo Bonzini
2015-01-30 16:57 ` Radim Krčmář [this message]
2015-01-30 21:15 ` Paolo Bonzini
2015-01-30 9:22 ` [PATCH 0/8] KVM: minor APIC fixes and cleanups Paolo Bonzini
2015-01-30 15:20 ` Radim Krčmář
2015-01-30 15:24 ` Paolo Bonzini
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150130165747.GD27414@potion.redhat.com \
--to=rkrcmar@redhat.com \
--cc=gleb@kernel.org \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=namit@cs.technion.ac.il \
--cc=pbonzini@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.