All of lore.kernel.org
 help / color / mirror / Atom feed
From: Paolo Bonzini <pbonzini@redhat.com>
To: Marcelo Tosatti <mtosatti@redhat.com>, Wanpeng Li <kernellwp@gmail.com>
Cc: LKML <linux-kernel@vger.kernel.org>, kvm <kvm@vger.kernel.org>,
	"Radim Krčmář" <rkrcmar@redhat.com>
Subject: Re: [PATCH v4 2/5] KVM: LAPIC: inject lapic timer interrupt by posted interrupt
Date: Tue, 25 Jun 2019 19:02:53 +0200	[thread overview]
Message-ID: <61e43444-f91c-3181-1f59-12a3634bf043@redhat.com> (raw)
In-Reply-To: <20190621214205.GA4751@amt.cnet>

On 21/06/19 23:42, Marcelo Tosatti wrote:
> On Fri, Jun 21, 2019 at 09:42:39AM +0800, Wanpeng Li wrote:
>> On Thu, 20 Jun 2019 at 05:04, Marcelo Tosatti <mtosatti@redhat.com> wrote:
>>>
>>> Hi Li,
>>>
>>> On Wed, Jun 19, 2019 at 08:36:06AM +0800, Wanpeng Li wrote:
>>>> On Tue, 18 Jun 2019 at 21:36, Marcelo Tosatti <mtosatti@redhat.com> wrote:
>>>>>
>>>>> On Mon, Jun 17, 2019 at 07:24:44PM +0800, Wanpeng Li wrote:
>>>>>> From: Wanpeng Li <wanpengli@tencent.com>
>>>>>>
>>>>>> Dedicated instances are currently disturbed by unnecessary jitter due
>>>>>> to the emulated lapic timers fire on the same pCPUs which vCPUs resident.
>>>>>> There is no hardware virtual timer on Intel for guest like ARM. Both
>>>>>> programming timer in guest and the emulated timer fires incur vmexits.
>>>>>> This patch tries to avoid vmexit which is incurred by the emulated
>>>>>> timer fires in dedicated instance scenario.
>>>>>>
>>>>>> When nohz_full is enabled in dedicated instances scenario, the emulated
>>>>>> timers can be offload to the nearest busy housekeeping cpus since APICv
>>>>>> is really common in recent years. The guest timer interrupt is injected
>>>>>> by posted-interrupt which is delivered by housekeeping cpu once the emulated
>>>>>> timer fires.
>>>>>>
>>>>>> The host admin should fine tuned, e.g. dedicated instances scenario w/
>>>>>> nohz_full cover the pCPUs which vCPUs resident, several pCPUs surplus
>>>>>> for busy housekeeping, disable mwait/hlt/pause vmexits to keep in non-root
>>>>>> mode, ~3% redis performance benefit can be observed on Skylake server.
>>>>>>
>>>>>> w/o patch:
>>>>>>
>>>>>>             VM-EXIT  Samples  Samples%  Time%   Min Time  Max Time   Avg time
>>>>>>
>>>>>> EXTERNAL_INTERRUPT    42916    49.43%   39.30%   0.47us   106.09us   0.71us ( +-   1.09% )
>>>>>>
>>>>>> w/ patch:
>>>>>>
>>>>>>             VM-EXIT  Samples  Samples%  Time%   Min Time  Max Time         Avg time
>>>>>>
>>>>>> EXTERNAL_INTERRUPT    6871     9.29%     2.96%   0.44us    57.88us   0.72us ( +-   4.02% )
>>>>>>
>>>>>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>>>>>> Cc: Radim Krčmář <rkrcmar@redhat.com>
>>>>>> Cc: Marcelo Tosatti <mtosatti@redhat.com>
>>>>>> Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
>>>>>> ---
>>>>>>  arch/x86/kvm/lapic.c            | 33 ++++++++++++++++++++++++++-------
>>>>>>  arch/x86/kvm/lapic.h            |  1 +
>>>>>>  arch/x86/kvm/vmx/vmx.c          |  3 ++-
>>>>>>  arch/x86/kvm/x86.c              |  5 +++++
>>>>>>  arch/x86/kvm/x86.h              |  2 ++
>>>>>>  include/linux/sched/isolation.h |  2 ++
>>>>>>  kernel/sched/isolation.c        |  6 ++++++
>>>>>>  7 files changed, 44 insertions(+), 8 deletions(-)
>>>>>>
>>>>>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
>>>>>> index 87ecb56..9ceeee5 100644
>>>>>> --- a/arch/x86/kvm/lapic.c
>>>>>> +++ b/arch/x86/kvm/lapic.c
>>>>>> @@ -122,6 +122,13 @@ static inline u32 kvm_x2apic_id(struct kvm_lapic *apic)
>>>>>>       return apic->vcpu->vcpu_id;
>>>>>>  }
>>>>>>
>>>>>> +bool posted_interrupt_inject_timer(struct kvm_vcpu *vcpu)
>>>>>> +{
>>>>>> +     return pi_inject_timer && kvm_vcpu_apicv_active(vcpu) &&
>>>>>> +             kvm_hlt_in_guest(vcpu->kvm);
>>>>>> +}
>>>>>> +EXPORT_SYMBOL_GPL(posted_interrupt_inject_timer);
>>>>>
>>>>> Paolo, can you explain the reasoning behind this?
>>>>>
>>>>> Should not be necessary...
>>
>> https://lkml.org/lkml/2019/6/5/436  "Here you need to check
>> kvm_halt_in_guest, not kvm_mwait_in_guest, because you need to go
>> through kvm_apic_expired if the guest needs to be woken up from
>> kvm_vcpu_block."
> 
> Ah, i think he means that a sleeping vcpu (in kvm_vcpu_block) must
> be woken up, if it receives a timer interrupt.

Yes, this is true.

Paolo

> But your patch will go through:
> 
> kvm_apic_inject_pending_timer_irqs
> __apic_accept_irq -> 
> vmx_deliver_posted_interrupt ->
> kvm_vcpu_trigger_posted_interrupt returns false
> (because vcpu->mode != IN_GUEST_MODE) ->
> kvm_vcpu_kick
> 
> Which will wakeup the vcpu.
> 
> Apart from this oops, which triggers when running:
> taskset -c 1 ./cyclictest -D 3600 -p 99 -t 1 -h 30 -m -n  -i 50000 -b 40
> 
> Timer interruption from housekeeping vcpus is normal to me 
> (without requiring kvm_hlt_in_guest).
> 
> [ 1145.849646] BUG: kernel NULL pointer dereference, address:
> 0000000000000000
> [ 1145.850481] #PF: supervisor instruction fetch in kernel mode
> [ 1145.851161] #PF: error_code(0x0010) - not-present page
> [ 1145.851772] PGD 80000002a9fa5067 P4D 80000002a9fa5067 PUD 2abcbb067
> PMD 0 
> [ 1145.852578] Oops: 0010 [#1] PREEMPT SMP PTI
> [ 1145.853066] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 5.2.0-rc1+ #11
> [ 1145.853809] Hardware name: Red Hat KVM, BIOS 1.11.0-2.el7 04/01/2014
> [ 1145.854554] RIP: 0010:0x0
> [ 1145.854879] Code: Bad RIP value.
> [ 1145.855270] RSP: 0018:ffffc90001903e68 EFLAGS: 00010013
> [ 1145.855902] RAX: 0000010ac9f60043 RBX: ffff8882b58a8320 RCX:
> 00000000c526b7c4              
> [ 1145.856726] RDX: 0000000000000000 RSI: ffffffff820d9640 RDI:
> ffff8882b58a8320              
> [ 1145.857560] RBP: ffffffff820d9640 R08: 00000000c526b7c4 R09:
> 0000000000000832              
> [ 1145.858390] R10: 0000000000000000 R11: 0000000000000000 R12:
> 0000000000000000              
> [ 1145.859222] R13: ffffffff820d9658 R14: ffff8881063b2880 R15:
> 0000000000000002              
> [ 1145.860047] FS:  0000000000000000(0000) GS:ffff8882b5880000(0000)
> knlGS:0000000000000000   
> [ 1145.860994] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033                              
> [ 1145.861692] CR2: ffffffffffffffd6 CR3: 00000002ab1de001 CR4:
> 0000000000160ee0              
> [ 1145.862570] Call Trace:                                                                    
> [ 1145.862877]  cpuidle_enter_state+0x7c/0x3e0                                                
> [ 1145.863392]  cpuidle_enter+0x29/0x40                                                       
> 
> 
>> I think we can still be woken up from kvm_vcpu_block() if pir is set.
> 
> Exactly.
> 


  parent reply	other threads:[~2019-06-25 17:03 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-17 11:24 [PATCH v4 0/5] KVM: LAPIC: Implement Exitless Timer Wanpeng Li
2019-06-17 11:24 ` [PATCH v4 1/5] KVM: LAPIC: Make lapic timer unpinned Wanpeng Li
2019-06-17 11:48   ` Peter Xu
2019-06-18  0:38     ` Wanpeng Li
2019-06-17 11:24 ` [PATCH v4 2/5] KVM: LAPIC: inject lapic timer interrupt by posted interrupt Wanpeng Li
2019-06-18 13:35   ` Marcelo Tosatti
2019-06-19  0:36     ` Wanpeng Li
2019-06-19 21:03       ` Marcelo Tosatti
2019-06-20  0:52         ` Wanpeng Li
2019-06-21  1:42         ` Wanpeng Li
2019-06-21 21:42           ` Marcelo Tosatti
2019-06-24  8:53             ` Wanpeng Li
2019-06-25 19:00               ` Marcelo Tosatti
2019-06-26 11:02                 ` Wanpeng Li
2019-06-26 16:44                   ` Marcelo Tosatti
2019-06-28  8:26                     ` Wanpeng Li
2019-06-25 17:02             ` Paolo Bonzini [this message]
2019-06-17 11:24 ` [PATCH v4 3/5] KVM: LAPIC: Ignore timer migration when lapic timer is injected by pi Wanpeng Li
2019-06-17 11:24 ` [PATCH v4 4/5] KVM: LAPIC: Don't posted inject already-expired timer Wanpeng Li
2019-06-17 11:24 ` [PATCH v4 5/5] KVM: LAPIC: add advance timer support to pi_inject_timer Wanpeng Li
2019-06-17 21:32   ` Radim Krčmář
2019-06-18  0:44     ` Wanpeng Li
2019-06-18  0:57       ` Wanpeng Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=61e43444-f91c-3181-1f59-12a3634bf043@redhat.com \
    --to=pbonzini@redhat.com \
    --cc=kernellwp@gmail.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mtosatti@redhat.com \
    --cc=rkrcmar@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.