linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* RCU stall with high number of KVM vcpus
       [not found] <20171113131000.GA10546@hc>
@ 2017-11-13 13:47 ` Marc Zyngier
  2017-11-13 17:35   ` Jan Glauber
  0 siblings, 1 reply; 11+ messages in thread
From: Marc Zyngier @ 2017-11-13 13:47 UTC (permalink / raw)
  To: linux-arm-kernel

On 13/11/17 13:10, Jan Glauber wrote:
> I'm seeing RCU stalls in the host with 4.14 when I run KVM on ARM64 (ThunderX2) with a high
> number of vcpus (60). I only use one guest that does kernel compiles in

Is that only reproducible on 4.14? With or without VHE? Can you
reproduce this on another implementation (such as ThunderX-1)?

> a loop. After some hours (less likely the more debugging options are
> enabled, more likely with more vcpus) RCU stalls are happening in both host & guest.
> 
> Both host & guest recover after some time, until the issue is triggered
> again. 
> 
> Stack traces in the guest are next to useless, everything is messed up
> there. The host seems to stave on kvm->mmu_lock spin lock, the lock_stat

Please elaborate. Messed in what way? Corrupted? The guest crashing? Or
is that a tooling issue?

> numbers don't look good, see waittime-max:
> 
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>                               class name    con-bounces    contentions   waittime-min   waittime-max waittime-total   waittime-avg    acq-bounces   acquisitions   holdtime-min   holdtime-max holdtime-total   holdtime-avg
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> 
>                 &(&kvm->mmu_lock)->rlock:      99346764       99406604           0.14  1321260806.59 710654434972.0        7148.97      154228320      225122857           0.13   917688890.60  3705916481.39          16.46
>                 ------------------------
>                 &(&kvm->mmu_lock)->rlock       99365598          [<ffff0000080b43b8>] kvm_handle_guest_abort+0x4c0/0x950
>                 &(&kvm->mmu_lock)->rlock          25164          [<ffff0000080a4e30>] kvm_mmu_notifier_invalidate_range_start+0x70/0xe8
>                 &(&kvm->mmu_lock)->rlock          14934          [<ffff0000080a7eec>] kvm_mmu_notifier_invalidate_range_end+0x24/0x68
>                 &(&kvm->mmu_lock)->rlock            908          [<ffff00000810a1f0>] __cond_resched_lock+0x68/0xb8
>                 ------------------------
>                 &(&kvm->mmu_lock)->rlock              3          [<ffff0000080b34c8>] stage2_flush_vm+0x60/0xd8
>                 &(&kvm->mmu_lock)->rlock       99186296          [<ffff0000080b43b8>] kvm_handle_guest_abort+0x4c0/0x950
>                 &(&kvm->mmu_lock)->rlock         179238          [<ffff0000080a4e30>] kvm_mmu_notifier_invalidate_range_start+0x70/0xe8
>                 &(&kvm->mmu_lock)->rlock          19181          [<ffff0000080a7eec>] kvm_mmu_notifier_invalidate_range_end+0x24/0x68
> 
> .............................................................................................................................................................................................................................
[slots of stuff]

Well, the mmu_lock is clearly contended. Is the box in a state where you
are swapping? There seem to be as many faults as contentions, which is a
bit surprising...

Also, we recently moved arm64 to qrwlocks, which may have an impact.
Care to give this[1] a go and report the result?

Thanks,

	M.

[1]: https://lkml.org/lkml/2017/10/12/266
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RCU stall with high number of KVM vcpus
  2017-11-13 13:47 ` RCU stall with high number of KVM vcpus Marc Zyngier
@ 2017-11-13 17:35   ` Jan Glauber
  2017-11-13 18:11     ` Marc Zyngier
  2017-11-13 18:13     ` Shameerali Kolothum Thodi
  0 siblings, 2 replies; 11+ messages in thread
From: Jan Glauber @ 2017-11-13 17:35 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Nov 13, 2017 at 01:47:38PM +0000, Marc Zyngier wrote:
> On 13/11/17 13:10, Jan Glauber wrote:
> > I'm seeing RCU stalls in the host with 4.14 when I run KVM on ARM64 (ThunderX2) with a high
> > number of vcpus (60). I only use one guest that does kernel compiles in
> 
> Is that only reproducible on 4.14? With or without VHE? Can you
> reproduce this on another implementation (such as ThunderX-1)?

I've reproduced it on a distro 4.13 and several vanilla 4.14 rc's and
tip/locking. VHE is enabled. I've not yet tried to reproduce it with
older kernels or ThunderX-1. I can check if it happens also on ThunderX-1.

> > a loop. After some hours (less likely the more debugging options are
> > enabled, more likely with more vcpus) RCU stalls are happening in both host & guest.
> > 
> > Both host & guest recover after some time, until the issue is triggered
> > again. 
> > 
> > Stack traces in the guest are next to useless, everything is messed up
> > there. The host seems to stave on kvm->mmu_lock spin lock, the lock_stat
> 
> Please elaborate. Messed in what way? Corrupted? The guest crashing? Or
> is that a tooling issue?

Every vcpu that oopses prints one line in parallel, so I get blocks like:
[58880.179814] [<ffff000008084b98>] ret_from_fork+0x10/0x18
[58880.179834] [<ffff000008084b98>] ret_from_fork+0x10/0x18
[58880.179847] [<ffff000008084b98>] ret_from_fork+0x10/0x18
[58880.179873] [<ffff000008084b98>] ret_from_fork+0x10/0x18
[58880.179893] [<ffff000008084b98>] ret_from_fork+0x10/0x18
[58880.179911] [<ffff000008084b98>] ret_from_fork+0x10/0x18
[58880.179917] [<ffff000008084b98>] ret_from_fork+0x10/0x18
[58880.180288] [<ffff000008084b98>] ret_from_fork+0x10/0x18
[58880.180303] [<ffff000008084b98>] ret_from_fork+0x10/0x18
[58880.180336] [<ffff000008084b98>] ret_from_fork+0x10/0x18
[58880.180363] [<ffff000008084b98>] ret_from_fork+0x10/0x18
[58880.180384] [<ffff000008084b98>] ret_from_fork+0x10/0x18
[58880.180415] [<ffff000008084b98>] ret_from_fork+0x10/0x18
[58880.180461] [<ffff000008084b98>] ret_from_fork+0x10/0x18

I can send the full log if you want to have a look.

> > numbers don't look good, see waittime-max:
> > 
> > -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> >                               class name    con-bounces    contentions   waittime-min   waittime-max waittime-total   waittime-avg    acq-bounces   acquisitions   holdtime-min   holdtime-max holdtime-total   holdtime-avg
> > -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> > 
> >                 &(&kvm->mmu_lock)->rlock:      99346764       99406604           0.14  1321260806.59 710654434972.0        7148.97      154228320      225122857           0.13   917688890.60  3705916481.39          16.46
> >                 ------------------------
> >                 &(&kvm->mmu_lock)->rlock       99365598          [<ffff0000080b43b8>] kvm_handle_guest_abort+0x4c0/0x950
> >                 &(&kvm->mmu_lock)->rlock          25164          [<ffff0000080a4e30>] kvm_mmu_notifier_invalidate_range_start+0x70/0xe8
> >                 &(&kvm->mmu_lock)->rlock          14934          [<ffff0000080a7eec>] kvm_mmu_notifier_invalidate_range_end+0x24/0x68
> >                 &(&kvm->mmu_lock)->rlock            908          [<ffff00000810a1f0>] __cond_resched_lock+0x68/0xb8
> >                 ------------------------
> >                 &(&kvm->mmu_lock)->rlock              3          [<ffff0000080b34c8>] stage2_flush_vm+0x60/0xd8
> >                 &(&kvm->mmu_lock)->rlock       99186296          [<ffff0000080b43b8>] kvm_handle_guest_abort+0x4c0/0x950
> >                 &(&kvm->mmu_lock)->rlock         179238          [<ffff0000080a4e30>] kvm_mmu_notifier_invalidate_range_start+0x70/0xe8
> >                 &(&kvm->mmu_lock)->rlock          19181          [<ffff0000080a7eec>] kvm_mmu_notifier_invalidate_range_end+0x24/0x68
> > 
> > .............................................................................................................................................................................................................................
> [slots of stuff]
> 
> Well, the mmu_lock is clearly contended. Is the box in a state where you
> are swapping? There seem to be as many faults as contentions, which is a
> bit surprising...

I don't think it is swapping but need to double check.

> Also, we recently moved arm64 to qrwlocks, which may have an impact.
> Care to give this[1] a go and report the result?

Sure, I that was my first suspision but I can reproduce the issue with
and without the qrwlock patches, so these are not to blame. Also, the
starving lock is a spinlock and not a qrwlock. So maybe the spinlocks
have fairness issues too?

--Jan

> Thanks,
> 
> 	M.
> 
> [1]: https://lkml.org/lkml/2017/10/12/266
> -- 
> Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RCU stall with high number of KVM vcpus
  2017-11-13 17:35   ` Jan Glauber
@ 2017-11-13 18:11     ` Marc Zyngier
  2017-11-13 18:40       ` Jan Glauber
  2017-11-14  7:52       ` Jan Glauber
  2017-11-13 18:13     ` Shameerali Kolothum Thodi
  1 sibling, 2 replies; 11+ messages in thread
From: Marc Zyngier @ 2017-11-13 18:11 UTC (permalink / raw)
  To: linux-arm-kernel

On 13/11/17 17:35, Jan Glauber wrote:
> On Mon, Nov 13, 2017 at 01:47:38PM +0000, Marc Zyngier wrote:
>> On 13/11/17 13:10, Jan Glauber wrote:
>>> I'm seeing RCU stalls in the host with 4.14 when I run KVM on ARM64 (ThunderX2) with a high
>>> number of vcpus (60). I only use one guest that does kernel compiles in
>>
>> Is that only reproducible on 4.14? With or without VHE? Can you
>> reproduce this on another implementation (such as ThunderX-1)?
> 
> I've reproduced it on a distro 4.13 and several vanilla 4.14 rc's and
> tip/locking. VHE is enabled. I've not yet tried to reproduce it with
> older kernels or ThunderX-1. I can check if it happens also on ThunderX-1.

That'd be interesting. Also, what happens if you turn off any form of
multi-threading (as I've been told this system has 4 threads per core)?
Disabling VHE would be interesting too (after all, this is the only box
I know of that has the feature...).

>>> a loop. After some hours (less likely the more debugging options are
>>> enabled, more likely with more vcpus) RCU stalls are happening in both host & guest.
>>>
>>> Both host & guest recover after some time, until the issue is triggered
>>> again. 
>>>
>>> Stack traces in the guest are next to useless, everything is messed up
>>> there. The host seems to stave on kvm->mmu_lock spin lock, the lock_stat
>>
>> Please elaborate. Messed in what way? Corrupted? The guest crashing? Or
>> is that a tooling issue?
> 
> Every vcpu that oopses prints one line in parallel, so I get blocks like:
> [58880.179814] [<ffff000008084b98>] ret_from_fork+0x10/0x18
> [58880.179834] [<ffff000008084b98>] ret_from_fork+0x10/0x18
> [58880.179847] [<ffff000008084b98>] ret_from_fork+0x10/0x18
> [58880.179873] [<ffff000008084b98>] ret_from_fork+0x10/0x18
> [58880.179893] [<ffff000008084b98>] ret_from_fork+0x10/0x18
> [58880.179911] [<ffff000008084b98>] ret_from_fork+0x10/0x18
> [58880.179917] [<ffff000008084b98>] ret_from_fork+0x10/0x18
> [58880.180288] [<ffff000008084b98>] ret_from_fork+0x10/0x18
> [58880.180303] [<ffff000008084b98>] ret_from_fork+0x10/0x18
> [58880.180336] [<ffff000008084b98>] ret_from_fork+0x10/0x18
> [58880.180363] [<ffff000008084b98>] ret_from_fork+0x10/0x18
> [58880.180384] [<ffff000008084b98>] ret_from_fork+0x10/0x18
> [58880.180415] [<ffff000008084b98>] ret_from_fork+0x10/0x18
> [58880.180461] [<ffff000008084b98>] ret_from_fork+0x10/0x18
> 
> I can send the full log if you want to have a look.

Sure, send that over (maybe not over email though).

> 
>>> numbers don't look good, see waittime-max:
>>>
>>> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>>                               class name    con-bounces    contentions   waittime-min   waittime-max waittime-total   waittime-avg    acq-bounces   acquisitions   holdtime-min   holdtime-max holdtime-total   holdtime-avg
>>> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>>
>>>                 &(&kvm->mmu_lock)->rlock:      99346764       99406604           0.14  1321260806.59 710654434972.0        7148.97      154228320      225122857           0.13   917688890.60  3705916481.39          16.46
>>>                 ------------------------
>>>                 &(&kvm->mmu_lock)->rlock       99365598          [<ffff0000080b43b8>] kvm_handle_guest_abort+0x4c0/0x950
>>>                 &(&kvm->mmu_lock)->rlock          25164          [<ffff0000080a4e30>] kvm_mmu_notifier_invalidate_range_start+0x70/0xe8
>>>                 &(&kvm->mmu_lock)->rlock          14934          [<ffff0000080a7eec>] kvm_mmu_notifier_invalidate_range_end+0x24/0x68
>>>                 &(&kvm->mmu_lock)->rlock            908          [<ffff00000810a1f0>] __cond_resched_lock+0x68/0xb8
>>>                 ------------------------
>>>                 &(&kvm->mmu_lock)->rlock              3          [<ffff0000080b34c8>] stage2_flush_vm+0x60/0xd8
>>>                 &(&kvm->mmu_lock)->rlock       99186296          [<ffff0000080b43b8>] kvm_handle_guest_abort+0x4c0/0x950
>>>                 &(&kvm->mmu_lock)->rlock         179238          [<ffff0000080a4e30>] kvm_mmu_notifier_invalidate_range_start+0x70/0xe8
>>>                 &(&kvm->mmu_lock)->rlock          19181          [<ffff0000080a7eec>] kvm_mmu_notifier_invalidate_range_end+0x24/0x68
>>>
>>> .............................................................................................................................................................................................................................
>> [slots of stuff]
>>
>> Well, the mmu_lock is clearly contended. Is the box in a state where you
>> are swapping? There seem to be as many faults as contentions, which is a
>> bit surprising...
> 
> I don't think it is swapping but need to double check.

It is the number of aborts that is staggering. And each one of them
leads to the mmu_lock being contended. So something seems to be taking
its sweet time holding the damned lock.

> 
>> Also, we recently moved arm64 to qrwlocks, which may have an impact.
>> Care to give this[1] a go and report the result?
> 
> Sure, I that was my first suspision but I can reproduce the issue with
> and without the qrwlock patches, so these are not to blame. Also, the
> starving lock is a spinlock and not a qrwlock. So maybe the spinlocks
> have fairness issues too?

We use ticket locks, so they should be fine on the host (it is a
different story altogether in a guest).

Can you send me your kernel configuration as well? I wonder if we're not
seeing interactions with things like KSM and the like...

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RCU stall with high number of KVM vcpus
  2017-11-13 17:35   ` Jan Glauber
  2017-11-13 18:11     ` Marc Zyngier
@ 2017-11-13 18:13     ` Shameerali Kolothum Thodi
  2017-11-14  7:49       ` Jan Glauber
  1 sibling, 1 reply; 11+ messages in thread
From: Shameerali Kolothum Thodi @ 2017-11-13 18:13 UTC (permalink / raw)
  To: linux-arm-kernel



> -----Original Message-----
> From: linux-arm-kernel [mailto:linux-arm-kernel-bounces at lists.infradead.org]
> On Behalf Of Jan Glauber
> Sent: Monday, November 13, 2017 5:36 PM
> To: Marc Zyngier <marc.zyngier@arm.com>
> Cc: linux-arm-kernel at lists.infradead.org; Paolo Bonzini
> <pbonzini@redhat.com>; Christoffer Dall <christoffer.dall@linaro.org>;
> kvm at vger.kernel.org; Radim Kr?m?? <rkrcmar@redhat.com>
> Subject: Re: RCU stall with high number of KVM vcpus
> 
> On Mon, Nov 13, 2017 at 01:47:38PM +0000, Marc Zyngier wrote:
> > On 13/11/17 13:10, Jan Glauber wrote:
> > > I'm seeing RCU stalls in the host with 4.14 when I run KVM on ARM64
> (ThunderX2) with a high
> > > number of vcpus (60). I only use one guest that does kernel compiles in
> >
> > Is that only reproducible on 4.14? With or without VHE? Can you
> > reproduce this on another implementation (such as ThunderX-1)?
> 
> I've reproduced it on a distro 4.13 and several vanilla 4.14 rc's and
> tip/locking. VHE is enabled. I've not yet tried to reproduce it with
> older kernels or ThunderX-1. I can check if it happens also on ThunderX-1.
> 
> > > a loop. After some hours (less likely the more debugging options are
> > > enabled, more likely with more vcpus) RCU stalls are happening in both
> host & guest.
> > >
> > > Both host & guest recover after some time, until the issue is triggered
> > > again.
> > >
> > > Stack traces in the guest are next to useless, everything is messed up
> > > there. The host seems to stave on kvm->mmu_lock spin lock, the lock_stat
> >
> > Please elaborate. Messed in what way? Corrupted? The guest crashing? Or
> > is that a tooling issue?
> 
> Every vcpu that oopses prints one line in parallel, so I get blocks like:
> [58880.179814] [<ffff000008084b98>] ret_from_fork+0x10/0x18
> [58880.179834] [<ffff000008084b98>] ret_from_fork+0x10/0x18
> [58880.179847] [<ffff000008084b98>] ret_from_fork+0x10/0x18
> [58880.179873] [<ffff000008084b98>] ret_from_fork+0x10/0x18
> [58880.179893] [<ffff000008084b98>] ret_from_fork+0x10/0x18
> [58880.179911] [<ffff000008084b98>] ret_from_fork+0x10/0x18
> [58880.179917] [<ffff000008084b98>] ret_from_fork+0x10/0x18
> [58880.180288] [<ffff000008084b98>] ret_from_fork+0x10/0x18
> [58880.180303] [<ffff000008084b98>] ret_from_fork+0x10/0x18
> [58880.180336] [<ffff000008084b98>] ret_from_fork+0x10/0x18
> [58880.180363] [<ffff000008084b98>] ret_from_fork+0x10/0x18
> [58880.180384] [<ffff000008084b98>] ret_from_fork+0x10/0x18
> [58880.180415] [<ffff000008084b98>] ret_from_fork+0x10/0x18
> [58880.180461] [<ffff000008084b98>] ret_from_fork+0x10/0x18
> 
> I can send the full log if you want to have a look.
> 
> > > numbers don't look good, see waittime-max:
> > >
> > > ---------------------------------------------------------------------------------------------------
> -------------------------------------------------------------------------------------------------------
> -------------------
> > >                               class name    con-bounces    contentions   waittime-min
> waittime-max waittime-total   waittime-avg    acq-bounces   acquisitions
> holdtime-min   holdtime-max holdtime-total   holdtime-avg
> > > ---------------------------------------------------------------------------------------------------
> -------------------------------------------------------------------------------------------------------
> -------------------
> > >
> > >                 &(&kvm->mmu_lock)->rlock:      99346764       99406604
> 0.14  1321260806.59 710654434972.0        7148.97      154228320
> 225122857           0.13   917688890.60  3705916481.39          16.46
> > >                 ------------------------
> > >                 &(&kvm->mmu_lock)->rlock       99365598
> [<ffff0000080b43b8>] kvm_handle_guest_abort+0x4c0/0x950
> > >                 &(&kvm->mmu_lock)->rlock          25164
> [<ffff0000080a4e30>] kvm_mmu_notifier_invalidate_range_start+0x70/0xe8
> > >                 &(&kvm->mmu_lock)->rlock          14934
> [<ffff0000080a7eec>] kvm_mmu_notifier_invalidate_range_end+0x24/0x68
> > >                 &(&kvm->mmu_lock)->rlock            908
> [<ffff00000810a1f0>] __cond_resched_lock+0x68/0xb8
> > >                 ------------------------
> > >                 &(&kvm->mmu_lock)->rlock              3          [<ffff0000080b34c8>]
> stage2_flush_vm+0x60/0xd8
> > >                 &(&kvm->mmu_lock)->rlock       99186296
> [<ffff0000080b43b8>] kvm_handle_guest_abort+0x4c0/0x950
> > >                 &(&kvm->mmu_lock)->rlock         179238
> [<ffff0000080a4e30>] kvm_mmu_notifier_invalidate_range_start+0x70/0xe8
> > >                 &(&kvm->mmu_lock)->rlock          19181
> [<ffff0000080a7eec>] kvm_mmu_notifier_invalidate_range_end+0x24/0x68

That looks like something similar we had on our hip07 platform when multiple VMs
were launched.  The issue was tracked down to CONFIG_NUMA set with memory_less
nodes. This results in lot of individual 4K pages and unmap_stage2_ptes() takes a good
amount of time coupled with some HW cache flush latencies. I am not sure you are
seeing the same thing, but may be worth checking.

Thanks,
Shameer

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RCU stall with high number of KVM vcpus
  2017-11-13 18:11     ` Marc Zyngier
@ 2017-11-13 18:40       ` Jan Glauber
  2017-11-14 13:30         ` Marc Zyngier
  2017-11-14  7:52       ` Jan Glauber
  1 sibling, 1 reply; 11+ messages in thread
From: Jan Glauber @ 2017-11-13 18:40 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Nov 13, 2017 at 06:11:19PM +0000, Marc Zyngier wrote:
> On 13/11/17 17:35, Jan Glauber wrote:
> > On Mon, Nov 13, 2017 at 01:47:38PM +0000, Marc Zyngier wrote:

[...]

> >> Please elaborate. Messed in what way? Corrupted? The guest crashing? Or
> >> is that a tooling issue?
> > 
> > Every vcpu that oopses prints one line in parallel, so I get blocks like:
> > [58880.179814] [<ffff000008084b98>] ret_from_fork+0x10/0x18
> > [58880.179834] [<ffff000008084b98>] ret_from_fork+0x10/0x18
> > [58880.179847] [<ffff000008084b98>] ret_from_fork+0x10/0x18
> > [58880.179873] [<ffff000008084b98>] ret_from_fork+0x10/0x18
> > [58880.179893] [<ffff000008084b98>] ret_from_fork+0x10/0x18
> > [58880.179911] [<ffff000008084b98>] ret_from_fork+0x10/0x18
> > [58880.179917] [<ffff000008084b98>] ret_from_fork+0x10/0x18
> > [58880.180288] [<ffff000008084b98>] ret_from_fork+0x10/0x18
> > [58880.180303] [<ffff000008084b98>] ret_from_fork+0x10/0x18
> > [58880.180336] [<ffff000008084b98>] ret_from_fork+0x10/0x18
> > [58880.180363] [<ffff000008084b98>] ret_from_fork+0x10/0x18
> > [58880.180384] [<ffff000008084b98>] ret_from_fork+0x10/0x18
> > [58880.180415] [<ffff000008084b98>] ret_from_fork+0x10/0x18
> > [58880.180461] [<ffff000008084b98>] ret_from_fork+0x10/0x18
> > 
> > I can send the full log if you want to have a look.
> 
> Sure, send that over (maybe not over email though).

Here is the guest dmesg:
http://paste.ubuntu.com/25955682/

And the host dmesg as it might have been too big for the lists:
http://paste.ubuntu.com/25955699/



> Can you send me your kernel configuration as well? I wonder if we're not
> seeing interactions with things like KSM and the like...

It was plain arm64 defconfig but I disabled
HAVE_KVM_CPU_RELAX_INTERCEPT. Not that it had an impact, just tried.

http://paste.ubuntu.com/25955726/

--Jan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RCU stall with high number of KVM vcpus
  2017-11-13 18:13     ` Shameerali Kolothum Thodi
@ 2017-11-14  7:49       ` Jan Glauber
  0 siblings, 0 replies; 11+ messages in thread
From: Jan Glauber @ 2017-11-14  7:49 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Nov 13, 2017 at 06:13:08PM +0000, Shameerali Kolothum Thodi wrote:

[...]

> > > > numbers don't look good, see waittime-max:
> > > >
> > > > ---------------------------------------------------------------------------------------------------
> > -------------------------------------------------------------------------------------------------------
> > -------------------
> > > >                               class name    con-bounces    contentions   waittime-min
> > waittime-max waittime-total   waittime-avg    acq-bounces   acquisitions
> > holdtime-min   holdtime-max holdtime-total   holdtime-avg
> > > > ---------------------------------------------------------------------------------------------------
> > -------------------------------------------------------------------------------------------------------
> > -------------------
> > > >
> > > >                 &(&kvm->mmu_lock)->rlock:      99346764       99406604
> > 0.14  1321260806.59 710654434972.0        7148.97      154228320
> > 225122857           0.13   917688890.60  3705916481.39          16.46
> > > >                 ------------------------
> > > >                 &(&kvm->mmu_lock)->rlock       99365598
> > [<ffff0000080b43b8>] kvm_handle_guest_abort+0x4c0/0x950
> > > >                 &(&kvm->mmu_lock)->rlock          25164
> > [<ffff0000080a4e30>] kvm_mmu_notifier_invalidate_range_start+0x70/0xe8
> > > >                 &(&kvm->mmu_lock)->rlock          14934
> > [<ffff0000080a7eec>] kvm_mmu_notifier_invalidate_range_end+0x24/0x68
> > > >                 &(&kvm->mmu_lock)->rlock            908
> > [<ffff00000810a1f0>] __cond_resched_lock+0x68/0xb8
> > > >                 ------------------------
> > > >                 &(&kvm->mmu_lock)->rlock              3          [<ffff0000080b34c8>]
> > stage2_flush_vm+0x60/0xd8
> > > >                 &(&kvm->mmu_lock)->rlock       99186296
> > [<ffff0000080b43b8>] kvm_handle_guest_abort+0x4c0/0x950
> > > >                 &(&kvm->mmu_lock)->rlock         179238
> > [<ffff0000080a4e30>] kvm_mmu_notifier_invalidate_range_start+0x70/0xe8
> > > >                 &(&kvm->mmu_lock)->rlock          19181
> > [<ffff0000080a7eec>] kvm_mmu_notifier_invalidate_range_end+0x24/0x68
> 
> That looks like something similar we had on our hip07 platform when multiple VMs
> were launched.  The issue was tracked down to CONFIG_NUMA set with memory_less
> nodes. This results in lot of individual 4K pages and unmap_stage2_ptes() takes a good
> amount of time coupled with some HW cache flush latencies. I am not sure you are
> seeing the same thing, but may be worth checking.

Hi Shameer,

thanks for the tip. We don't have memory-less nodes but it might me
related to NUMA. I've tried putting the guest onto one node but that did
not help.

PID                               Node 0          Node 1           Total
-----------------------  --------------- --------------- ---------------
56753 (qemu-nbd)                    4.48           11.16           15.64
56813 (qemu-system-aar)             2.02         1685.72         1687.75
-----------------------  --------------- --------------- ---------------
Total                               6.51         1696.88         1703.39

I'll try switching to 64K pages in the host next.

thanks,
Jan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RCU stall with high number of KVM vcpus
  2017-11-13 18:11     ` Marc Zyngier
  2017-11-13 18:40       ` Jan Glauber
@ 2017-11-14  7:52       ` Jan Glauber
  2017-11-14  8:49         ` Marc Zyngier
  1 sibling, 1 reply; 11+ messages in thread
From: Jan Glauber @ 2017-11-14  7:52 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Nov 13, 2017 at 06:11:19PM +0000, Marc Zyngier wrote:
> On 13/11/17 17:35, Jan Glauber wrote:

[...]

> >>> numbers don't look good, see waittime-max:
> >>>
> >>> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> >>>                               class name    con-bounces    contentions   waittime-min   waittime-max waittime-total   waittime-avg    acq-bounces   acquisitions   holdtime-min   holdtime-max holdtime-total   holdtime-avg
> >>> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> >>>
> >>>                 &(&kvm->mmu_lock)->rlock:      99346764       99406604           0.14  1321260806.59 710654434972.0        7148.97      154228320      225122857           0.13   917688890.60  3705916481.39          16.46
> >>>                 ------------------------
> >>>                 &(&kvm->mmu_lock)->rlock       99365598          [<ffff0000080b43b8>] kvm_handle_guest_abort+0x4c0/0x950
> >>>                 &(&kvm->mmu_lock)->rlock          25164          [<ffff0000080a4e30>] kvm_mmu_notifier_invalidate_range_start+0x70/0xe8
> >>>                 &(&kvm->mmu_lock)->rlock          14934          [<ffff0000080a7eec>] kvm_mmu_notifier_invalidate_range_end+0x24/0x68
> >>>                 &(&kvm->mmu_lock)->rlock            908          [<ffff00000810a1f0>] __cond_resched_lock+0x68/0xb8
> >>>                 ------------------------
> >>>                 &(&kvm->mmu_lock)->rlock              3          [<ffff0000080b34c8>] stage2_flush_vm+0x60/0xd8
> >>>                 &(&kvm->mmu_lock)->rlock       99186296          [<ffff0000080b43b8>] kvm_handle_guest_abort+0x4c0/0x950
> >>>                 &(&kvm->mmu_lock)->rlock         179238          [<ffff0000080a4e30>] kvm_mmu_notifier_invalidate_range_start+0x70/0xe8
> >>>                 &(&kvm->mmu_lock)->rlock          19181          [<ffff0000080a7eec>] kvm_mmu_notifier_invalidate_range_end+0x24/0x68
> >>>
> >>> .............................................................................................................................................................................................................................
> >> [slots of stuff]
> >>
> >> Well, the mmu_lock is clearly contended. Is the box in a state where you
> >> are swapping? There seem to be as many faults as contentions, which is a
> >> bit surprising...
> > 
> > I don't think it is swapping but need to double check.
> 
> It is the number of aborts that is staggering. And each one of them
> leads to the mmu_lock being contended. So something seems to be taking
> its sweet time holding the damned lock.

Can you elaborate on the aborts, I'm not familiar with KVM but from a
first look I thought kvm_handle_guest_abort() is in the normal path
when a vcpu is stopped. Is that wrong?

--Jan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RCU stall with high number of KVM vcpus
  2017-11-14  7:52       ` Jan Glauber
@ 2017-11-14  8:49         ` Marc Zyngier
  2017-11-14 11:34           ` Suzuki K Poulose
  0 siblings, 1 reply; 11+ messages in thread
From: Marc Zyngier @ 2017-11-14  8:49 UTC (permalink / raw)
  To: linux-arm-kernel

On 14/11/17 07:52, Jan Glauber wrote:
> On Mon, Nov 13, 2017 at 06:11:19PM +0000, Marc Zyngier wrote:
>> On 13/11/17 17:35, Jan Glauber wrote:
> 
> [...]
> 
>>>>> numbers don't look good, see waittime-max:
>>>>>
>>>>> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>>>>                               class name    con-bounces    contentions   waittime-min   waittime-max waittime-total   waittime-avg    acq-bounces   acquisitions   holdtime-min   holdtime-max holdtime-total   holdtime-avg
>>>>> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>>>>
>>>>>                 &(&kvm->mmu_lock)->rlock:      99346764       99406604           0.14  1321260806.59 710654434972.0        7148.97      154228320      225122857           0.13   917688890.60  3705916481.39          16.46
>>>>>                 ------------------------
>>>>>                 &(&kvm->mmu_lock)->rlock       99365598          [<ffff0000080b43b8>] kvm_handle_guest_abort+0x4c0/0x950
>>>>>                 &(&kvm->mmu_lock)->rlock          25164          [<ffff0000080a4e30>] kvm_mmu_notifier_invalidate_range_start+0x70/0xe8
>>>>>                 &(&kvm->mmu_lock)->rlock          14934          [<ffff0000080a7eec>] kvm_mmu_notifier_invalidate_range_end+0x24/0x68
>>>>>                 &(&kvm->mmu_lock)->rlock            908          [<ffff00000810a1f0>] __cond_resched_lock+0x68/0xb8
>>>>>                 ------------------------
>>>>>                 &(&kvm->mmu_lock)->rlock              3          [<ffff0000080b34c8>] stage2_flush_vm+0x60/0xd8
>>>>>                 &(&kvm->mmu_lock)->rlock       99186296          [<ffff0000080b43b8>] kvm_handle_guest_abort+0x4c0/0x950
>>>>>                 &(&kvm->mmu_lock)->rlock         179238          [<ffff0000080a4e30>] kvm_mmu_notifier_invalidate_range_start+0x70/0xe8
>>>>>                 &(&kvm->mmu_lock)->rlock          19181          [<ffff0000080a7eec>] kvm_mmu_notifier_invalidate_range_end+0x24/0x68
>>>>>
>>>>> .............................................................................................................................................................................................................................
>>>> [slots of stuff]
>>>>
>>>> Well, the mmu_lock is clearly contended. Is the box in a state where you
>>>> are swapping? There seem to be as many faults as contentions, which is a
>>>> bit surprising...
>>>
>>> I don't think it is swapping but need to double check.
>>
>> It is the number of aborts that is staggering. And each one of them
>> leads to the mmu_lock being contended. So something seems to be taking
>> its sweet time holding the damned lock.
> 
> Can you elaborate on the aborts, I'm not familiar with KVM but from a
> first look I thought kvm_handle_guest_abort() is in the normal path
> when a vcpu is stopped. Is that wrong?

kvm_handle_guest_abort() is the entry point for our page fault handling
(hence the mmu_lock being taken). On its own, the number of faults is
irrelevant. What worries me is that in almost all the cases the lock was
contended, we were handling a page fault.

What would be interesting is to find out *who* is holding the lock when
we're being blocked in kvm_handle_guest_abort...

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RCU stall with high number of KVM vcpus
  2017-11-14  8:49         ` Marc Zyngier
@ 2017-11-14 11:34           ` Suzuki K Poulose
  0 siblings, 0 replies; 11+ messages in thread
From: Suzuki K Poulose @ 2017-11-14 11:34 UTC (permalink / raw)
  To: linux-arm-kernel

On 14/11/17 08:49, Marc Zyngier wrote:
> On 14/11/17 07:52, Jan Glauber wrote:
>> On Mon, Nov 13, 2017 at 06:11:19PM +0000, Marc Zyngier wrote:
>>> On 13/11/17 17:35, Jan Glauber wrote:
>>
>> [...]
>>
>>>>>> numbers don't look good, see waittime-max:
>>>>>>
>>>>>> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>>>>>                                class name    con-bounces    contentions   waittime-min   waittime-max waittime-total   waittime-avg    acq-bounces   acquisitions   holdtime-min   holdtime-max holdtime-total   holdtime-avg
>>>>>> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>>>>>
>>>>>>                  &(&kvm->mmu_lock)->rlock:      99346764       99406604           0.14  1321260806.59 710654434972.0        7148.97      154228320      225122857           0.13   917688890.60  3705916481.39          16.46
>>>>>>                  ------------------------
>>>>>>                  &(&kvm->mmu_lock)->rlock       99365598          [<ffff0000080b43b8>] kvm_handle_guest_abort+0x4c0/0x950
>>>>>>                  &(&kvm->mmu_lock)->rlock          25164          [<ffff0000080a4e30>] kvm_mmu_notifier_invalidate_range_start+0x70/0xe8
>>>>>>                  &(&kvm->mmu_lock)->rlock          14934          [<ffff0000080a7eec>] kvm_mmu_notifier_invalidate_range_end+0x24/0x68
>>>>>>                  &(&kvm->mmu_lock)->rlock            908          [<ffff00000810a1f0>] __cond_resched_lock+0x68/0xb8
>>>>>>                  ------------------------
>>>>>>                  &(&kvm->mmu_lock)->rlock              3          [<ffff0000080b34c8>] stage2_flush_vm+0x60/0xd8
>>>>>>                  &(&kvm->mmu_lock)->rlock       99186296          [<ffff0000080b43b8>] kvm_handle_guest_abort+0x4c0/0x950
>>>>>>                  &(&kvm->mmu_lock)->rlock         179238          [<ffff0000080a4e30>] kvm_mmu_notifier_invalidate_range_start+0x70/0xe8
>>>>>>                  &(&kvm->mmu_lock)->rlock          19181          [<ffff0000080a7eec>] kvm_mmu_notifier_invalidate_range_end+0x24/0x68
>>>>>>
>>>>>> .............................................................................................................................................................................................................................
>>>>> [slots of stuff]
>>>>>
>>>>> Well, the mmu_lock is clearly contended. Is the box in a state where you
>>>>> are swapping? There seem to be as many faults as contentions, which is a
>>>>> bit surprising...
>>>>
>>>> I don't think it is swapping but need to double check.
>>>
>>> It is the number of aborts that is staggering. And each one of them
>>> leads to the mmu_lock being contended. So something seems to be taking
>>> its sweet time holding the damned lock.
>>
>> Can you elaborate on the aborts, I'm not familiar with KVM but from a
>> first look I thought kvm_handle_guest_abort() is in the normal path
>> when a vcpu is stopped. Is that wrong?
> 
> kvm_handle_guest_abort() is the entry point for our page fault handling
> (hence the mmu_lock being taken). On its own, the number of faults is
> irrelevant. What worries me is that in almost all the cases the lock was
> contended, we were handling a page fault.
> 
> What would be interesting is to find out *who* is holding the lock when
> we're being blocked in kvm_handle_guest_abort...

Just a thought, turning on the tracepoints for kvm_hva_* might help to get some
more data on what we are doing with the HVA ranges.

Cheers
Suzuki

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RCU stall with high number of KVM vcpus
  2017-11-13 18:40       ` Jan Glauber
@ 2017-11-14 13:30         ` Marc Zyngier
  2017-11-14 14:19           ` Jan Glauber
  0 siblings, 1 reply; 11+ messages in thread
From: Marc Zyngier @ 2017-11-14 13:30 UTC (permalink / raw)
  To: linux-arm-kernel

On 13/11/17 18:40, Jan Glauber wrote:
> On Mon, Nov 13, 2017 at 06:11:19PM +0000, Marc Zyngier wrote:
>> On 13/11/17 17:35, Jan Glauber wrote:
>>> On Mon, Nov 13, 2017 at 01:47:38PM +0000, Marc Zyngier wrote:
> 
> [...]
> 
>>>> Please elaborate. Messed in what way? Corrupted? The guest crashing? Or
>>>> is that a tooling issue?
>>>
>>> Every vcpu that oopses prints one line in parallel, so I get blocks like:
>>> [58880.179814] [<ffff000008084b98>] ret_from_fork+0x10/0x18
>>> [58880.179834] [<ffff000008084b98>] ret_from_fork+0x10/0x18
>>> [58880.179847] [<ffff000008084b98>] ret_from_fork+0x10/0x18
>>> [58880.179873] [<ffff000008084b98>] ret_from_fork+0x10/0x18
>>> [58880.179893] [<ffff000008084b98>] ret_from_fork+0x10/0x18
>>> [58880.179911] [<ffff000008084b98>] ret_from_fork+0x10/0x18
>>> [58880.179917] [<ffff000008084b98>] ret_from_fork+0x10/0x18
>>> [58880.180288] [<ffff000008084b98>] ret_from_fork+0x10/0x18
>>> [58880.180303] [<ffff000008084b98>] ret_from_fork+0x10/0x18
>>> [58880.180336] [<ffff000008084b98>] ret_from_fork+0x10/0x18
>>> [58880.180363] [<ffff000008084b98>] ret_from_fork+0x10/0x18
>>> [58880.180384] [<ffff000008084b98>] ret_from_fork+0x10/0x18
>>> [58880.180415] [<ffff000008084b98>] ret_from_fork+0x10/0x18
>>> [58880.180461] [<ffff000008084b98>] ret_from_fork+0x10/0x18
>>>
>>> I can send the full log if you want to have a look.
>>
>> Sure, send that over (maybe not over email though).
> 
> Here is the guest dmesg:
> http://paste.ubuntu.com/25955682/

Yeah, that's because all the vcpus are getting starved at the same time,
and spitting out interleaved traces... Not very useful anyway, as I
think this is only a consequence of what's happening on the host.

> 
> And the host dmesg as it might have been too big for the lists:
> http://paste.ubuntu.com/25955699/

And that one doesn't show much either, apart from indicating that
something is keeping the lock for itself. Drat.

We need to narrow down the problem, or make it appear on more common HW.
Let me know if you've managed to reproduce it with non-VHE and/or on TX-1.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RCU stall with high number of KVM vcpus
  2017-11-14 13:30         ` Marc Zyngier
@ 2017-11-14 14:19           ` Jan Glauber
  0 siblings, 0 replies; 11+ messages in thread
From: Jan Glauber @ 2017-11-14 14:19 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Nov 14, 2017 at 01:30:07PM +0000, Marc Zyngier wrote:
> On 13/11/17 18:40, Jan Glauber wrote:
> > On Mon, Nov 13, 2017 at 06:11:19PM +0000, Marc Zyngier wrote:
> >> On 13/11/17 17:35, Jan Glauber wrote:
> >>> On Mon, Nov 13, 2017 at 01:47:38PM +0000, Marc Zyngier wrote:
> > 
> > [...]
> > 
> >>>> Please elaborate. Messed in what way? Corrupted? The guest crashing? Or
> >>>> is that a tooling issue?
> >>>
> >>> Every vcpu that oopses prints one line in parallel, so I get blocks like:
> >>> [58880.179814] [<ffff000008084b98>] ret_from_fork+0x10/0x18
> >>> [58880.179834] [<ffff000008084b98>] ret_from_fork+0x10/0x18
> >>> [58880.179847] [<ffff000008084b98>] ret_from_fork+0x10/0x18
> >>> [58880.179873] [<ffff000008084b98>] ret_from_fork+0x10/0x18
> >>> [58880.179893] [<ffff000008084b98>] ret_from_fork+0x10/0x18
> >>> [58880.179911] [<ffff000008084b98>] ret_from_fork+0x10/0x18
> >>> [58880.179917] [<ffff000008084b98>] ret_from_fork+0x10/0x18
> >>> [58880.180288] [<ffff000008084b98>] ret_from_fork+0x10/0x18
> >>> [58880.180303] [<ffff000008084b98>] ret_from_fork+0x10/0x18
> >>> [58880.180336] [<ffff000008084b98>] ret_from_fork+0x10/0x18
> >>> [58880.180363] [<ffff000008084b98>] ret_from_fork+0x10/0x18
> >>> [58880.180384] [<ffff000008084b98>] ret_from_fork+0x10/0x18
> >>> [58880.180415] [<ffff000008084b98>] ret_from_fork+0x10/0x18
> >>> [58880.180461] [<ffff000008084b98>] ret_from_fork+0x10/0x18
> >>>
> >>> I can send the full log if you want to have a look.
> >>
> >> Sure, send that over (maybe not over email though).
> > 
> > Here is the guest dmesg:
> > http://paste.ubuntu.com/25955682/
> 
> Yeah, that's because all the vcpus are getting starved at the same time,
> and spitting out interleaved traces... Not very useful anyway, as I
> think this is only a consequence of what's happening on the host.
> 
> > 
> > And the host dmesg as it might have been too big for the lists:
> > http://paste.ubuntu.com/25955699/
> 
> And that one doesn't show much either, apart from indicating that
> something is keeping the lock for itself. Drat.
> 
> We need to narrow down the problem, or make it appear on more common HW.
> Let me know if you've managed to reproduce it with non-VHE and/or on TX-1.

It also shows up when I disable VHE (CONFIG_ARM64_VHE). I'll try
enabling some tracepoints next.

--Jan

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2017-11-14 14:19 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20171113131000.GA10546@hc>
2017-11-13 13:47 ` RCU stall with high number of KVM vcpus Marc Zyngier
2017-11-13 17:35   ` Jan Glauber
2017-11-13 18:11     ` Marc Zyngier
2017-11-13 18:40       ` Jan Glauber
2017-11-14 13:30         ` Marc Zyngier
2017-11-14 14:19           ` Jan Glauber
2017-11-14  7:52       ` Jan Glauber
2017-11-14  8:49         ` Marc Zyngier
2017-11-14 11:34           ` Suzuki K Poulose
2017-11-13 18:13     ` Shameerali Kolothum Thodi
2017-11-14  7:49       ` Jan Glauber

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).