All of lore.kernel.org
 help / color / mirror / Atom feed
* KVM/ARM: sleeping function called from invalid context
@ 2017-03-30 14:31 ` Mark Rutland
  0 siblings, 0 replies; 13+ messages in thread
From: Mark Rutland @ 2017-03-30 14:31 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: suzuki.poulose, marc.zyngier, christoffer.dall

Hi,

I'm seeing the splat below when running KVM on an arm64 host with
CONFIG_DEBUG_ATOMIC_SLEEP and CONFIG_LOCKDEP enabled.

I saw this on v4.11-rc1, and I can reproduce the problem on the current
kvmarm master branch (563e2f5daa66fbc1).

I've hacked noinlines into arch/arm/kvm/mmu.c in an attempt to get a
better backtrace; without this, the report says the call is at
arch/arm/kvm/mmu.c:299, which is somewhat confusing.

Splat:

[  135.549391] BUG: sleeping function called from invalid context at arch/arm64/kvm/../../../arch/arm/kvm/mmu.c:302
[  135.559712] in_atomic(): 1, irqs_disabled(): 0, pid: 2311, name: kvm-vcpu-0
[  135.566709] 8 locks held by kvm-vcpu-0/2311:
[  135.571010]  #0:  (&vcpu->mutex){+.+.+.}, at: [<ffff2000080d7c70>] vcpu_load+0x28/0x2b0
[  135.579075]  #1:  (&kvm->srcu){......}, at: [<ffff2000080ff358>] kvm_handle_guest_abort+0x208/0x7d0
[  135.588177]  #2:  (&mm->mmap_sem){++++++}, at: [<ffff2000084d57f4>] get_user_pages_unlocked+0xbc/0x370
[  135.597540]  #3:  (&anon_vma->rwsem){++++..}, at: [<ffff20000851b064>] page_lock_anon_vma_read+0x164/0x588
[  135.607244]  #4:  (&(ptlock_ptr(page))->rlock){+.+.-.}, at: [<ffff20000850e2c0>] page_vma_mapped_walk+0x8b0/0x14f0
[  135.617647]  #5:  (&srcu){......}, at: [<ffff200008571c9c>] __mmu_notifier_invalidate_page+0x10c/0x3c0
[  135.627012]  #6:  (&kvm->srcu){......}, at: [<ffff2000080dafd4>] kvm_mmu_notifier_invalidate_page+0x10c/0x320
[  135.636980]  #7:  (&(&kvm->mmu_lock)->rlock){+.+.-.}, at: [<ffff2000080db00c>] kvm_mmu_notifier_invalidate_page+0x144/0x320
[  135.648180] CPU: 1 PID: 2311 Comm: kvm-vcpu-0 Not tainted 4.11.0-rc1-00006-gf9bc6f5-dirty #2
[  135.656616] Hardware name: AMD Seattle (Rev.B0) Development Board (Overdrive) (DT)
[  135.664183] Call trace:
[  135.666636] [<ffff200008094ee0>] dump_backtrace+0x0/0x588
[  135.672039] [<ffff200008095488>] show_stack+0x20/0x30
[  135.677095] [<ffff200008c34a44>] dump_stack+0x16c/0x1e0
[  135.682325] [<ffff2000081cb0fc>] ___might_sleep+0x2e4/0x508
[  135.687902] [<ffff2000080faee4>] unmap_stage2_range+0x114/0x200
[  135.693825] [<ffff2000080faff8>] kvm_unmap_hva_handler+0x28/0x38
[  135.699836] [<ffff2000080fb908>] handle_hva_to_gpa+0x178/0x2a0
[  135.705672] [<ffff2000080ff984>] kvm_unmap_hva+0x64/0xa0
[  135.710989] [<ffff2000080db048>] kvm_mmu_notifier_invalidate_page+0x180/0x320
[  135.718128] [<ffff200008571d6c>] __mmu_notifier_invalidate_page+0x1dc/0x3c0
[  135.725094] [<ffff200008517cb8>] try_to_unmap_one+0x4a0/0x1360
[  135.730931] [<ffff2000085121c0>] rmap_walk_anon+0x2d0/0xa68
[  135.736507] [<ffff20000851b58c>] rmap_walk+0x104/0x1e0
[  135.741649] [<ffff20000851c1b8>] try_to_unmap+0x1b8/0x500
[  135.747053] [<ffff20000859bbcc>] __unmap_and_move+0x364/0x938
[  135.752802] [<ffff20000859c31c>] unmap_and_move.isra.3+0x17c/0xd40
[  135.758987] [<ffff20000859e0f0>] migrate_pages+0x228/0x960
[  135.764477] [<ffff2000084c232c>] compact_zone+0xeec/0x1d10
[  135.769967] [<ffff2000084c3264>] compact_zone_order+0x114/0x198
[  135.775891] [<ffff2000084c5260>] try_to_compact_pages+0x338/0x758
[  135.781992] [<ffff2000084439a0>] __alloc_pages_direct_compact+0x80/0x858
[  135.788698] [<ffff2000084451d4>] __alloc_pages_nodemask+0x7bc/0x1b18
[  135.795055] [<ffff20000856af2c>] alloc_pages_vma+0x48c/0x848
[  135.800719] [<ffff2000085a9680>] do_huge_pmd_anonymous_page+0x2e0/0x1b48
[  135.807423] [<ffff2000084f153c>] __handle_mm_fault+0xe64/0x1de8
[  135.813346] [<ffff2000084f28cc>] handle_mm_fault+0x40c/0xbc0
[  135.819009] [<ffff2000084d48f8>] __get_user_pages+0x210/0x888
[  135.824760] [<ffff2000084d5920>] get_user_pages_unlocked+0x1e8/0x370
[  135.831119] [<ffff2000080e265c>] __gfn_to_pfn_memslot+0x634/0xae0
[  135.837220] [<ffff2000080e2b50>] gfn_to_pfn_prot+0x48/0x58
[  135.842711] [<ffff2000080fdc78>] user_mem_abort+0x380/0x7c8
[  135.848288] [<ffff2000080ff43c>] kvm_handle_guest_abort+0x2ec/0x7d0
[  135.854559] [<ffff200008104184>] handle_exit+0x244/0x508
[  135.859874] [<ffff2000080f39c8>] kvm_arch_vcpu_ioctl_run+0x890/0x1330
[  135.866318] [<ffff2000080d9ffc>] kvm_vcpu_ioctl+0x6bc/0xe30
[  135.871894] [<ffff20000863d1ac>] do_vfs_ioctl+0x194/0x14a0
[  135.877383] [<ffff20000863e560>] SyS_ioctl+0xa8/0xb8
[  135.882351] [<ffff200008084770>] el0_svc_naked+0x24/0x28

I'm able to trigger this fairly reliably by having a guest touch a large
amount of memory. I'm doing this by running two instances of the below,
each with GUESTRAM set to half of the host physical memory, as I found
that one instance was more likely to trigger the OoM killer.

	lkvm sandbox --console virtio -m ${GUESTRAM} --kernel Image \
	-p "memtest=1" -- true

(note: lkvm sandbox assumes a dir called 'guests' exists in cwd. You may need
to create it first if you do not already have one).

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* KVM/ARM: sleeping function called from invalid context
@ 2017-03-30 14:31 ` Mark Rutland
  0 siblings, 0 replies; 13+ messages in thread
From: Mark Rutland @ 2017-03-30 14:31 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

I'm seeing the splat below when running KVM on an arm64 host with
CONFIG_DEBUG_ATOMIC_SLEEP and CONFIG_LOCKDEP enabled.

I saw this on v4.11-rc1, and I can reproduce the problem on the current
kvmarm master branch (563e2f5daa66fbc1).

I've hacked noinlines into arch/arm/kvm/mmu.c in an attempt to get a
better backtrace; without this, the report says the call is at
arch/arm/kvm/mmu.c:299, which is somewhat confusing.

Splat:

[  135.549391] BUG: sleeping function called from invalid context at arch/arm64/kvm/../../../arch/arm/kvm/mmu.c:302
[  135.559712] in_atomic(): 1, irqs_disabled(): 0, pid: 2311, name: kvm-vcpu-0
[  135.566709] 8 locks held by kvm-vcpu-0/2311:
[  135.571010]  #0:  (&vcpu->mutex){+.+.+.}, at: [<ffff2000080d7c70>] vcpu_load+0x28/0x2b0
[  135.579075]  #1:  (&kvm->srcu){......}, at: [<ffff2000080ff358>] kvm_handle_guest_abort+0x208/0x7d0
[  135.588177]  #2:  (&mm->mmap_sem){++++++}, at: [<ffff2000084d57f4>] get_user_pages_unlocked+0xbc/0x370
[  135.597540]  #3:  (&anon_vma->rwsem){++++..}, at: [<ffff20000851b064>] page_lock_anon_vma_read+0x164/0x588
[  135.607244]  #4:  (&(ptlock_ptr(page))->rlock){+.+.-.}, at: [<ffff20000850e2c0>] page_vma_mapped_walk+0x8b0/0x14f0
[  135.617647]  #5:  (&srcu){......}, at: [<ffff200008571c9c>] __mmu_notifier_invalidate_page+0x10c/0x3c0
[  135.627012]  #6:  (&kvm->srcu){......}, at: [<ffff2000080dafd4>] kvm_mmu_notifier_invalidate_page+0x10c/0x320
[  135.636980]  #7:  (&(&kvm->mmu_lock)->rlock){+.+.-.}, at: [<ffff2000080db00c>] kvm_mmu_notifier_invalidate_page+0x144/0x320
[  135.648180] CPU: 1 PID: 2311 Comm: kvm-vcpu-0 Not tainted 4.11.0-rc1-00006-gf9bc6f5-dirty #2
[  135.656616] Hardware name: AMD Seattle (Rev.B0) Development Board (Overdrive) (DT)
[  135.664183] Call trace:
[  135.666636] [<ffff200008094ee0>] dump_backtrace+0x0/0x588
[  135.672039] [<ffff200008095488>] show_stack+0x20/0x30
[  135.677095] [<ffff200008c34a44>] dump_stack+0x16c/0x1e0
[  135.682325] [<ffff2000081cb0fc>] ___might_sleep+0x2e4/0x508
[  135.687902] [<ffff2000080faee4>] unmap_stage2_range+0x114/0x200
[  135.693825] [<ffff2000080faff8>] kvm_unmap_hva_handler+0x28/0x38
[  135.699836] [<ffff2000080fb908>] handle_hva_to_gpa+0x178/0x2a0
[  135.705672] [<ffff2000080ff984>] kvm_unmap_hva+0x64/0xa0
[  135.710989] [<ffff2000080db048>] kvm_mmu_notifier_invalidate_page+0x180/0x320
[  135.718128] [<ffff200008571d6c>] __mmu_notifier_invalidate_page+0x1dc/0x3c0
[  135.725094] [<ffff200008517cb8>] try_to_unmap_one+0x4a0/0x1360
[  135.730931] [<ffff2000085121c0>] rmap_walk_anon+0x2d0/0xa68
[  135.736507] [<ffff20000851b58c>] rmap_walk+0x104/0x1e0
[  135.741649] [<ffff20000851c1b8>] try_to_unmap+0x1b8/0x500
[  135.747053] [<ffff20000859bbcc>] __unmap_and_move+0x364/0x938
[  135.752802] [<ffff20000859c31c>] unmap_and_move.isra.3+0x17c/0xd40
[  135.758987] [<ffff20000859e0f0>] migrate_pages+0x228/0x960
[  135.764477] [<ffff2000084c232c>] compact_zone+0xeec/0x1d10
[  135.769967] [<ffff2000084c3264>] compact_zone_order+0x114/0x198
[  135.775891] [<ffff2000084c5260>] try_to_compact_pages+0x338/0x758
[  135.781992] [<ffff2000084439a0>] __alloc_pages_direct_compact+0x80/0x858
[  135.788698] [<ffff2000084451d4>] __alloc_pages_nodemask+0x7bc/0x1b18
[  135.795055] [<ffff20000856af2c>] alloc_pages_vma+0x48c/0x848
[  135.800719] [<ffff2000085a9680>] do_huge_pmd_anonymous_page+0x2e0/0x1b48
[  135.807423] [<ffff2000084f153c>] __handle_mm_fault+0xe64/0x1de8
[  135.813346] [<ffff2000084f28cc>] handle_mm_fault+0x40c/0xbc0
[  135.819009] [<ffff2000084d48f8>] __get_user_pages+0x210/0x888
[  135.824760] [<ffff2000084d5920>] get_user_pages_unlocked+0x1e8/0x370
[  135.831119] [<ffff2000080e265c>] __gfn_to_pfn_memslot+0x634/0xae0
[  135.837220] [<ffff2000080e2b50>] gfn_to_pfn_prot+0x48/0x58
[  135.842711] [<ffff2000080fdc78>] user_mem_abort+0x380/0x7c8
[  135.848288] [<ffff2000080ff43c>] kvm_handle_guest_abort+0x2ec/0x7d0
[  135.854559] [<ffff200008104184>] handle_exit+0x244/0x508
[  135.859874] [<ffff2000080f39c8>] kvm_arch_vcpu_ioctl_run+0x890/0x1330
[  135.866318] [<ffff2000080d9ffc>] kvm_vcpu_ioctl+0x6bc/0xe30
[  135.871894] [<ffff20000863d1ac>] do_vfs_ioctl+0x194/0x14a0
[  135.877383] [<ffff20000863e560>] SyS_ioctl+0xa8/0xb8
[  135.882351] [<ffff200008084770>] el0_svc_naked+0x24/0x28

I'm able to trigger this fairly reliably by having a guest touch a large
amount of memory. I'm doing this by running two instances of the below,
each with GUESTRAM set to half of the host physical memory, as I found
that one instance was more likely to trigger the OoM killer.

	lkvm sandbox --console virtio -m ${GUESTRAM} --kernel Image \
	-p "memtest=1" -- true

(note: lkvm sandbox assumes a dir called 'guests' exists in cwd. You may need
to create it first if you do not already have one).

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: KVM/ARM: sleeping function called from invalid context
  2017-03-30 14:31 ` Mark Rutland
@ 2017-03-30 15:29   ` Mark Rutland
  -1 siblings, 0 replies; 13+ messages in thread
From: Mark Rutland @ 2017-03-30 15:29 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm; +Cc: marc.zyngier

On Thu, Mar 30, 2017 at 03:31:12PM +0100, Mark Rutland wrote:
> Hi,
> 
> I'm seeing the splat below when running KVM on an arm64 host with
> CONFIG_DEBUG_ATOMIC_SLEEP and CONFIG_LOCKDEP enabled.
> 
> I saw this on v4.11-rc1, and I can reproduce the problem on the current
> kvmarm master branch (563e2f5daa66fbc1).
>
> I've hacked noinlines into arch/arm/kvm/mmu.c in an attempt to get a
> better backtrace; without this, the report says the call is at
> arch/arm/kvm/mmu.c:299, which is somewhat confusing.

Looking again, that initial kernel was not a vanilla v4.11-rc1, and I am
*not* able to reproduce this issue with a vanilla v4.11-rc1.

I believe I had applied an earlier fix for the locking issue Suzuki
recently addressed, which was why my line numbers were off.

I *can* trigger this issue with the current kvmarm master, and the log I
posted is valid.

Sorry for the bogus info; I will be more careful next time.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* KVM/ARM: sleeping function called from invalid context
@ 2017-03-30 15:29   ` Mark Rutland
  0 siblings, 0 replies; 13+ messages in thread
From: Mark Rutland @ 2017-03-30 15:29 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Mar 30, 2017 at 03:31:12PM +0100, Mark Rutland wrote:
> Hi,
> 
> I'm seeing the splat below when running KVM on an arm64 host with
> CONFIG_DEBUG_ATOMIC_SLEEP and CONFIG_LOCKDEP enabled.
> 
> I saw this on v4.11-rc1, and I can reproduce the problem on the current
> kvmarm master branch (563e2f5daa66fbc1).
>
> I've hacked noinlines into arch/arm/kvm/mmu.c in an attempt to get a
> better backtrace; without this, the report says the call is at
> arch/arm/kvm/mmu.c:299, which is somewhat confusing.

Looking again, that initial kernel was not a vanilla v4.11-rc1, and I am
*not* able to reproduce this issue with a vanilla v4.11-rc1.

I believe I had applied an earlier fix for the locking issue Suzuki
recently addressed, which was why my line numbers were off.

I *can* trigger this issue with the current kvmarm master, and the log I
posted is valid.

Sorry for the bogus info; I will be more careful next time.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: KVM/ARM: sleeping function called from invalid context
  2017-03-30 15:29   ` Mark Rutland
  (?)
@ 2017-03-30 15:41   ` James Okken
  -1 siblings, 0 replies; 13+ messages in thread
From: James Okken @ 2017-03-30 15:41 UTC (permalink / raw)
  To: kvm, kvm-owner

hi all,

I have this nagging question I'm hoping someone could clear up for me. There is so much information and discussion out there regarding HVM, PVM and PVHVM it is hard to get any concrete understanding of what I'm actually using.

I thought I understood that my KVM deployment on CentOS 7.2 was creating PVM VMs.

Checking dmesg, lspci, lsmod of the running centos7 VMs all indications show the VMs are PVMs.

But when I look at the XML of the VMs I see: <type arch='x86_64' machine='pc-i440fx-rhel7.0.0'>hvm</type>

Does that <type>hvm</type> mean I am mistaken?

thanks

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: KVM/ARM: sleeping function called from invalid context
  2017-03-30 15:29   ` Mark Rutland
@ 2017-03-30 15:41     ` Marc Zyngier
  -1 siblings, 0 replies; 13+ messages in thread
From: Marc Zyngier @ 2017-03-30 15:41 UTC (permalink / raw)
  To: Mark Rutland, linux-arm-kernel, kvmarm, kvm

On 30/03/17 16:29, Mark Rutland wrote:
> On Thu, Mar 30, 2017 at 03:31:12PM +0100, Mark Rutland wrote:
>> Hi,
>>
>> I'm seeing the splat below when running KVM on an arm64 host with
>> CONFIG_DEBUG_ATOMIC_SLEEP and CONFIG_LOCKDEP enabled.
>>
>> I saw this on v4.11-rc1, and I can reproduce the problem on the current
>> kvmarm master branch (563e2f5daa66fbc1).
>>
>> I've hacked noinlines into arch/arm/kvm/mmu.c in an attempt to get a
>> better backtrace; without this, the report says the call is at
>> arch/arm/kvm/mmu.c:299, which is somewhat confusing.
> 
> Looking again, that initial kernel was not a vanilla v4.11-rc1, and I am
> *not* able to reproduce this issue with a vanilla v4.11-rc1.
> 
> I believe I had applied an earlier fix for the locking issue Suzuki
> recently addressed, which was why my line numbers were off.
> 
> I *can* trigger this issue with the current kvmarm master, and the log I
> posted is valid.
> 
> Sorry for the bogus info; I will be more careful next time.

No worries, thanks Mark.

So here's my (very) superficial analysis of the issue:
- Memory pressure, we try to swap out something
- try_to_unmap_one takes a spinlock (via page_vma_mapped_walk)
- MMU notifier kick in with the spinlock held
- we take kvm->mmu_lock
- in unmap_stage2_range, we do a cond_resched_lock(kvm->mmu_lock)
- we still hold the page_vma_mapped_walk spinlock, might_sleep screams

I can see multiple ways of doing this:
1) We track that we're coming via an MMU notifier, and don't call
cond_resched_lock() in that case
2) We get rid of cond_resched_lock()
3) we have a different code path for the MMU notifier that doesn't
involve cond_resched_lock().

Only (1) vaguely appeals to me, and I positively hate (3). We could
revert to (2), but it is likely to be helpful when tearing down large
ranges.

Another possibility is that the might_sleep() warning is just spurious,
and I think that Suzuki has a theory...

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 13+ messages in thread

* KVM/ARM: sleeping function called from invalid context
@ 2017-03-30 15:41     ` Marc Zyngier
  0 siblings, 0 replies; 13+ messages in thread
From: Marc Zyngier @ 2017-03-30 15:41 UTC (permalink / raw)
  To: linux-arm-kernel

On 30/03/17 16:29, Mark Rutland wrote:
> On Thu, Mar 30, 2017 at 03:31:12PM +0100, Mark Rutland wrote:
>> Hi,
>>
>> I'm seeing the splat below when running KVM on an arm64 host with
>> CONFIG_DEBUG_ATOMIC_SLEEP and CONFIG_LOCKDEP enabled.
>>
>> I saw this on v4.11-rc1, and I can reproduce the problem on the current
>> kvmarm master branch (563e2f5daa66fbc1).
>>
>> I've hacked noinlines into arch/arm/kvm/mmu.c in an attempt to get a
>> better backtrace; without this, the report says the call is at
>> arch/arm/kvm/mmu.c:299, which is somewhat confusing.
> 
> Looking again, that initial kernel was not a vanilla v4.11-rc1, and I am
> *not* able to reproduce this issue with a vanilla v4.11-rc1.
> 
> I believe I had applied an earlier fix for the locking issue Suzuki
> recently addressed, which was why my line numbers were off.
> 
> I *can* trigger this issue with the current kvmarm master, and the log I
> posted is valid.
> 
> Sorry for the bogus info; I will be more careful next time.

No worries, thanks Mark.

So here's my (very) superficial analysis of the issue:
- Memory pressure, we try to swap out something
- try_to_unmap_one takes a spinlock (via page_vma_mapped_walk)
- MMU notifier kick in with the spinlock held
- we take kvm->mmu_lock
- in unmap_stage2_range, we do a cond_resched_lock(kvm->mmu_lock)
- we still hold the page_vma_mapped_walk spinlock, might_sleep screams

I can see multiple ways of doing this:
1) We track that we're coming via an MMU notifier, and don't call
cond_resched_lock() in that case
2) We get rid of cond_resched_lock()
3) we have a different code path for the MMU notifier that doesn't
involve cond_resched_lock().

Only (1) vaguely appeals to me, and I positively hate (3). We could
revert to (2), but it is likely to be helpful when tearing down large
ranges.

Another possibility is that the might_sleep() warning is just spurious,
and I think that Suzuki has a theory...

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: KVM/ARM: sleeping function called from invalid context
  2017-03-30 15:41     ` Marc Zyngier
@ 2017-03-30 16:40       ` Suzuki K Poulose
  -1 siblings, 0 replies; 13+ messages in thread
From: Suzuki K Poulose @ 2017-03-30 16:40 UTC (permalink / raw)
  To: Marc Zyngier, Mark Rutland, linux-arm-kernel, kvmarm, kvm; +Cc: tglx, peterz

On 30/03/17 16:41, Marc Zyngier wrote:
> On 30/03/17 16:29, Mark Rutland wrote:
>> On Thu, Mar 30, 2017 at 03:31:12PM +0100, Mark Rutland wrote:
>>> Hi,
>>>
>>> I'm seeing the splat below when running KVM on an arm64 host with
>>> CONFIG_DEBUG_ATOMIC_SLEEP and CONFIG_LOCKDEP enabled.
>>>
>>> I saw this on v4.11-rc1, and I can reproduce the problem on the current
>>> kvmarm master branch (563e2f5daa66fbc1).
>>>
>>> I've hacked noinlines into arch/arm/kvm/mmu.c in an attempt to get a
>>> better backtrace; without this, the report says the call is at
>>> arch/arm/kvm/mmu.c:299, which is somewhat confusing.
>>
>> Looking again, that initial kernel was not a vanilla v4.11-rc1, and I am
>> *not* able to reproduce this issue with a vanilla v4.11-rc1.
>>
>> I believe I had applied an earlier fix for the locking issue Suzuki
>> recently addressed, which was why my line numbers were off.
>>
>> I *can* trigger this issue with the current kvmarm master, and the log I
>> posted is valid.
>>
>> Sorry for the bogus info; I will be more careful next time.
>
> No worries, thanks Mark.
>
> So here's my (very) superficial analysis of the issue:
> - Memory pressure, we try to swap out something
> - try_to_unmap_one takes a spinlock (via page_vma_mapped_walk)
> - MMU notifier kick in with the spinlock held
> - we take kvm->mmu_lock
> - in unmap_stage2_range, we do a cond_resched_lock(kvm->mmu_lock)
> - we still hold the page_vma_mapped_walk spinlock, might_sleep screams

Correct.

>
> I can see multiple ways of doing this:
> 1) We track that we're coming via an MMU notifier, and don't call
> cond_resched_lock() in that case
> 2) We get rid of cond_resched_lock()
> 3) we have a different code path for the MMU notifier that doesn't
> involve cond_resched_lock().
>
> Only (1) vaguely appeals to me, and I positively hate (3). We could
> revert to (2), but it is likely to be helpful when tearing down large
> ranges.
>
> Another possibility is that the might_sleep() warning is just spurious,
> and I think that Suzuki has a theory...

So the cond_resched_lock() thinks that it could drop the lock and do a sched.
So it issues __might_sleep(PREEMPT_LOCK_OFFSET) to make sure the preempt_count
is as if there is only one {i.e, this} spin_lock preventing the preemption, and
it could drop the lock after verifying should_resched(PREEMPT_LOCK_OFFSET) (and bit more
other checks) turns true. But since we are holding two spin_locks in this rare path,
the preempt_offset is PREEMPT_LOCK_OFFSET + 1 (as we do preempt_count_inc for each
lock).

Now, ___might_sleep() checks if the preempt_count exactly matches the passed count, using
preempt_count_equals(), and goes ahead to complain if it doesn't, even if the preempt_count
is greater than the passed count, in which case the should_resched() should deny the schedule
operation and we skip it. So, I think, ___might_sleep should really check
if (preempt_count >= preempt_offset) to scream about it, of course making sure that the callers
honoring the same case.

To summarise, since we do have two locks held, we won't be able to reschedule in this case
but the might_sleep screams even if we are safe.

Suzuki

^ permalink raw reply	[flat|nested] 13+ messages in thread

* KVM/ARM: sleeping function called from invalid context
@ 2017-03-30 16:40       ` Suzuki K Poulose
  0 siblings, 0 replies; 13+ messages in thread
From: Suzuki K Poulose @ 2017-03-30 16:40 UTC (permalink / raw)
  To: linux-arm-kernel

On 30/03/17 16:41, Marc Zyngier wrote:
> On 30/03/17 16:29, Mark Rutland wrote:
>> On Thu, Mar 30, 2017 at 03:31:12PM +0100, Mark Rutland wrote:
>>> Hi,
>>>
>>> I'm seeing the splat below when running KVM on an arm64 host with
>>> CONFIG_DEBUG_ATOMIC_SLEEP and CONFIG_LOCKDEP enabled.
>>>
>>> I saw this on v4.11-rc1, and I can reproduce the problem on the current
>>> kvmarm master branch (563e2f5daa66fbc1).
>>>
>>> I've hacked noinlines into arch/arm/kvm/mmu.c in an attempt to get a
>>> better backtrace; without this, the report says the call is at
>>> arch/arm/kvm/mmu.c:299, which is somewhat confusing.
>>
>> Looking again, that initial kernel was not a vanilla v4.11-rc1, and I am
>> *not* able to reproduce this issue with a vanilla v4.11-rc1.
>>
>> I believe I had applied an earlier fix for the locking issue Suzuki
>> recently addressed, which was why my line numbers were off.
>>
>> I *can* trigger this issue with the current kvmarm master, and the log I
>> posted is valid.
>>
>> Sorry for the bogus info; I will be more careful next time.
>
> No worries, thanks Mark.
>
> So here's my (very) superficial analysis of the issue:
> - Memory pressure, we try to swap out something
> - try_to_unmap_one takes a spinlock (via page_vma_mapped_walk)
> - MMU notifier kick in with the spinlock held
> - we take kvm->mmu_lock
> - in unmap_stage2_range, we do a cond_resched_lock(kvm->mmu_lock)
> - we still hold the page_vma_mapped_walk spinlock, might_sleep screams

Correct.

>
> I can see multiple ways of doing this:
> 1) We track that we're coming via an MMU notifier, and don't call
> cond_resched_lock() in that case
> 2) We get rid of cond_resched_lock()
> 3) we have a different code path for the MMU notifier that doesn't
> involve cond_resched_lock().
>
> Only (1) vaguely appeals to me, and I positively hate (3). We could
> revert to (2), but it is likely to be helpful when tearing down large
> ranges.
>
> Another possibility is that the might_sleep() warning is just spurious,
> and I think that Suzuki has a theory...

So the cond_resched_lock() thinks that it could drop the lock and do a sched.
So it issues __might_sleep(PREEMPT_LOCK_OFFSET) to make sure the preempt_count
is as if there is only one {i.e, this} spin_lock preventing the preemption, and
it could drop the lock after verifying should_resched(PREEMPT_LOCK_OFFSET) (and bit more
other checks) turns true. But since we are holding two spin_locks in this rare path,
the preempt_offset is PREEMPT_LOCK_OFFSET + 1 (as we do preempt_count_inc for each
lock).

Now, ___might_sleep() checks if the preempt_count exactly matches the passed count, using
preempt_count_equals(), and goes ahead to complain if it doesn't, even if the preempt_count
is greater than the passed count, in which case the should_resched() should deny the schedule
operation and we skip it. So, I think, ___might_sleep should really check
if (preempt_count >= preempt_offset) to scream about it, of course making sure that the callers
honoring the same case.

To summarise, since we do have two locks held, we won't be able to reschedule in this case
but the might_sleep screams even if we are safe.

Suzuki

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: KVM/ARM: sleeping function called from invalid context
  2017-03-30 16:40       ` Suzuki K Poulose
@ 2017-03-30 16:43         ` Suzuki K Poulose
  -1 siblings, 0 replies; 13+ messages in thread
From: Suzuki K Poulose @ 2017-03-30 16:43 UTC (permalink / raw)
  To: Marc Zyngier, Mark Rutland, linux-arm-kernel, kvmarm, kvm; +Cc: peterz, tglx

On 30/03/17 17:40, Suzuki K Poulose wrote:
> On 30/03/17 16:41, Marc Zyngier wrote:
>> On 30/03/17 16:29, Mark Rutland wrote:
>>> On Thu, Mar 30, 2017 at 03:31:12PM +0100, Mark Rutland wrote:
>>>> Hi,
>>>>
>>>> I'm seeing the splat below when running KVM on an arm64 host with
>>>> CONFIG_DEBUG_ATOMIC_SLEEP and CONFIG_LOCKDEP enabled.
>>>>
>>>> I saw this on v4.11-rc1, and I can reproduce the problem on the current
>>>> kvmarm master branch (563e2f5daa66fbc1).
>>>>
>>>> I've hacked noinlines into arch/arm/kvm/mmu.c in an attempt to get a
>>>> better backtrace; without this, the report says the call is at
>>>> arch/arm/kvm/mmu.c:299, which is somewhat confusing.
>>>
>>> Looking again, that initial kernel was not a vanilla v4.11-rc1, and I am
>>> *not* able to reproduce this issue with a vanilla v4.11-rc1.
>>>
>>> I believe I had applied an earlier fix for the locking issue Suzuki
>>> recently addressed, which was why my line numbers were off.
>>>
>>> I *can* trigger this issue with the current kvmarm master, and the log I
>>> posted is valid.
>>>
>>> Sorry for the bogus info; I will be more careful next time.
>>
>> No worries, thanks Mark.
>>
>> So here's my (very) superficial analysis of the issue:
>> - Memory pressure, we try to swap out something
>> - try_to_unmap_one takes a spinlock (via page_vma_mapped_walk)
>> - MMU notifier kick in with the spinlock held
>> - we take kvm->mmu_lock
>> - in unmap_stage2_range, we do a cond_resched_lock(kvm->mmu_lock)
>> - we still hold the page_vma_mapped_walk spinlock, might_sleep screams
>
> Correct.
>
>>
>> I can see multiple ways of doing this:
>> 1) We track that we're coming via an MMU notifier, and don't call
>> cond_resched_lock() in that case
>> 2) We get rid of cond_resched_lock()
>> 3) we have a different code path for the MMU notifier that doesn't
>> involve cond_resched_lock().
>>
>> Only (1) vaguely appeals to me, and I positively hate (3). We could
>> revert to (2), but it is likely to be helpful when tearing down large
>> ranges.
>>
>> Another possibility is that the might_sleep() warning is just spurious,
>> and I think that Suzuki has a theory...
>
> So the cond_resched_lock() thinks that it could drop the lock and do a sched.
> So it issues __might_sleep(PREEMPT_LOCK_OFFSET) to make sure the preempt_count
> is as if there is only one {i.e, this} spin_lock preventing the preemption, and
> it could drop the lock after verifying should_resched(PREEMPT_LOCK_OFFSET) (and bit more
> other checks) turns true. But since we are holding two spin_locks in this rare path,
> the preempt_offset is PREEMPT_LOCK_OFFSET + 1 (as we do preempt_count_inc for each
> lock).
>
> Now, ___might_sleep() checks if the preempt_count exactly matches the passed count, using
> preempt_count_equals(), and goes ahead to complain if it doesn't, even if the preempt_count
> is greater than the passed count, in which case the should_resched() should deny the schedule
> operation and we skip it. So, I think, ___might_sleep should really check
> if (preempt_count >= preempt_offset) to scream about it, of course making sure that the callers

s/to scream/not to scream/

> honoring the same case.
>
> To summarise, since we do have two locks held, we won't be able to reschedule in this case
> but the might_sleep screams even if we are safe.
>

> Suzuki

^ permalink raw reply	[flat|nested] 13+ messages in thread

* KVM/ARM: sleeping function called from invalid context
@ 2017-03-30 16:43         ` Suzuki K Poulose
  0 siblings, 0 replies; 13+ messages in thread
From: Suzuki K Poulose @ 2017-03-30 16:43 UTC (permalink / raw)
  To: linux-arm-kernel

On 30/03/17 17:40, Suzuki K Poulose wrote:
> On 30/03/17 16:41, Marc Zyngier wrote:
>> On 30/03/17 16:29, Mark Rutland wrote:
>>> On Thu, Mar 30, 2017 at 03:31:12PM +0100, Mark Rutland wrote:
>>>> Hi,
>>>>
>>>> I'm seeing the splat below when running KVM on an arm64 host with
>>>> CONFIG_DEBUG_ATOMIC_SLEEP and CONFIG_LOCKDEP enabled.
>>>>
>>>> I saw this on v4.11-rc1, and I can reproduce the problem on the current
>>>> kvmarm master branch (563e2f5daa66fbc1).
>>>>
>>>> I've hacked noinlines into arch/arm/kvm/mmu.c in an attempt to get a
>>>> better backtrace; without this, the report says the call is at
>>>> arch/arm/kvm/mmu.c:299, which is somewhat confusing.
>>>
>>> Looking again, that initial kernel was not a vanilla v4.11-rc1, and I am
>>> *not* able to reproduce this issue with a vanilla v4.11-rc1.
>>>
>>> I believe I had applied an earlier fix for the locking issue Suzuki
>>> recently addressed, which was why my line numbers were off.
>>>
>>> I *can* trigger this issue with the current kvmarm master, and the log I
>>> posted is valid.
>>>
>>> Sorry for the bogus info; I will be more careful next time.
>>
>> No worries, thanks Mark.
>>
>> So here's my (very) superficial analysis of the issue:
>> - Memory pressure, we try to swap out something
>> - try_to_unmap_one takes a spinlock (via page_vma_mapped_walk)
>> - MMU notifier kick in with the spinlock held
>> - we take kvm->mmu_lock
>> - in unmap_stage2_range, we do a cond_resched_lock(kvm->mmu_lock)
>> - we still hold the page_vma_mapped_walk spinlock, might_sleep screams
>
> Correct.
>
>>
>> I can see multiple ways of doing this:
>> 1) We track that we're coming via an MMU notifier, and don't call
>> cond_resched_lock() in that case
>> 2) We get rid of cond_resched_lock()
>> 3) we have a different code path for the MMU notifier that doesn't
>> involve cond_resched_lock().
>>
>> Only (1) vaguely appeals to me, and I positively hate (3). We could
>> revert to (2), but it is likely to be helpful when tearing down large
>> ranges.
>>
>> Another possibility is that the might_sleep() warning is just spurious,
>> and I think that Suzuki has a theory...
>
> So the cond_resched_lock() thinks that it could drop the lock and do a sched.
> So it issues __might_sleep(PREEMPT_LOCK_OFFSET) to make sure the preempt_count
> is as if there is only one {i.e, this} spin_lock preventing the preemption, and
> it could drop the lock after verifying should_resched(PREEMPT_LOCK_OFFSET) (and bit more
> other checks) turns true. But since we are holding two spin_locks in this rare path,
> the preempt_offset is PREEMPT_LOCK_OFFSET + 1 (as we do preempt_count_inc for each
> lock).
>
> Now, ___might_sleep() checks if the preempt_count exactly matches the passed count, using
> preempt_count_equals(), and goes ahead to complain if it doesn't, even if the preempt_count
> is greater than the passed count, in which case the should_resched() should deny the schedule
> operation and we skip it. So, I think, ___might_sleep should really check
> if (preempt_count >= preempt_offset) to scream about it, of course making sure that the callers

s/to scream/not to scream/

> honoring the same case.
>
> To summarise, since we do have two locks held, we won't be able to reschedule in this case
> but the might_sleep screams even if we are safe.
>

> Suzuki

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: KVM/ARM: sleeping function called from invalid context
  2017-03-30 16:40       ` Suzuki K Poulose
@ 2017-03-30 17:36         ` Suzuki K Poulose
  -1 siblings, 0 replies; 13+ messages in thread
From: Suzuki K Poulose @ 2017-03-30 17:36 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: marc.zyngier, mark.rutland, peterz, kvmarm, kvm, linux-kernel,
	Suzuki K Poulose, Frederic Weisbecker, Peter Zijlstra,
	Arnd Bergmann

I have confirmed the theory using a sample code like :

	spin_lock(&a);
	spin_lock(&b);
	cond_resched_lock(&b);
	spin_unlock(&b);
	spin_unlock(&a);

Also, the following patch solves the problem for me.


 ----8>-----

sched: Fix ___might_sleep preempt count checks

___might_sleep checks if the preempt_count equals the passed in
preempt_offset to issue a warning. This could cause false warnings,
when preempt_count is greater than the requested offset. Fix the
check to make sure we handle this case.

e.g, following code sequence could cause false warning:

foo:
	spin_lock(&a);
	...
	bar() --------->     bar:
				spin_lock(&b);
				cond_resched_lock(&b);
				do_something();
				spin_unlock(&b);
		<-----------
	...
	spin_unlock(&a)

where locks a and b need not necessarily be related.

Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnd Bergmann <arnd@arndb.de>
Reported-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 kernel/sched/core.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 3b31fc0..28842dc 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6144,11 +6144,11 @@ void __init sched_init(void)
 }
 
 #ifdef CONFIG_DEBUG_ATOMIC_SLEEP
-static inline int preempt_count_equals(int preempt_offset)
+static inline int preempt_count_safe(int preempt_offset)
 {
 	int nested = preempt_count() + rcu_preempt_depth();
 
-	return (nested == preempt_offset);
+	return (nested >= preempt_offset);
 }
 
 void __might_sleep(const char *file, int line, int preempt_offset)
@@ -6179,7 +6179,7 @@ void ___might_sleep(const char *file, int line, int preempt_offset)
 	/* WARN_ON_ONCE() by default, no rate limit required: */
 	rcu_sleep_check();
 
-	if ((preempt_count_equals(preempt_offset) && !irqs_disabled() &&
+	if ((preempt_count_safe(preempt_offset) && !irqs_disabled() &&
 	     !is_idle_task(current)) ||
 	    system_state != SYSTEM_RUNNING || oops_in_progress)
 		return;
@@ -6205,7 +6205,7 @@ void ___might_sleep(const char *file, int line, int preempt_offset)
 	if (irqs_disabled())
 		print_irqtrace_events(current);
 	if (IS_ENABLED(CONFIG_DEBUG_PREEMPT)
-	    && !preempt_count_equals(preempt_offset)) {
+	    && !preempt_count_safe(preempt_offset)) {
 		pr_err("Preemption disabled at:");
 		print_ip_sym(preempt_disable_ip);
 		pr_cont("\n");
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* KVM/ARM: sleeping function called from invalid context
@ 2017-03-30 17:36         ` Suzuki K Poulose
  0 siblings, 0 replies; 13+ messages in thread
From: Suzuki K Poulose @ 2017-03-30 17:36 UTC (permalink / raw)
  To: linux-arm-kernel

I have confirmed the theory using a sample code like :

	spin_lock(&a);
	spin_lock(&b);
	cond_resched_lock(&b);
	spin_unlock(&b);
	spin_unlock(&a);

Also, the following patch solves the problem for me.


 ----8>-----

sched: Fix ___might_sleep preempt count checks

___might_sleep checks if the preempt_count equals the passed in
preempt_offset to issue a warning. This could cause false warnings,
when preempt_count is greater than the requested offset. Fix the
check to make sure we handle this case.

e.g, following code sequence could cause false warning:

foo:
	spin_lock(&a);
	...
	bar() --------->     bar:
				spin_lock(&b);
				cond_resched_lock(&b);
				do_something();
				spin_unlock(&b);
		<-----------
	...
	spin_unlock(&a)

where locks a and b need not necessarily be related.

Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnd Bergmann <arnd@arndb.de>
Reported-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 kernel/sched/core.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 3b31fc0..28842dc 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6144,11 +6144,11 @@ void __init sched_init(void)
 }
 
 #ifdef CONFIG_DEBUG_ATOMIC_SLEEP
-static inline int preempt_count_equals(int preempt_offset)
+static inline int preempt_count_safe(int preempt_offset)
 {
 	int nested = preempt_count() + rcu_preempt_depth();
 
-	return (nested == preempt_offset);
+	return (nested >= preempt_offset);
 }
 
 void __might_sleep(const char *file, int line, int preempt_offset)
@@ -6179,7 +6179,7 @@ void ___might_sleep(const char *file, int line, int preempt_offset)
 	/* WARN_ON_ONCE() by default, no rate limit required: */
 	rcu_sleep_check();
 
-	if ((preempt_count_equals(preempt_offset) && !irqs_disabled() &&
+	if ((preempt_count_safe(preempt_offset) && !irqs_disabled() &&
 	     !is_idle_task(current)) ||
 	    system_state != SYSTEM_RUNNING || oops_in_progress)
 		return;
@@ -6205,7 +6205,7 @@ void ___might_sleep(const char *file, int line, int preempt_offset)
 	if (irqs_disabled())
 		print_irqtrace_events(current);
 	if (IS_ENABLED(CONFIG_DEBUG_PREEMPT)
-	    && !preempt_count_equals(preempt_offset)) {
+	    && !preempt_count_safe(preempt_offset)) {
 		pr_err("Preemption disabled at:");
 		print_ip_sym(preempt_disable_ip);
 		pr_cont("\n");
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2017-03-30 17:37 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-30 14:31 KVM/ARM: sleeping function called from invalid context Mark Rutland
2017-03-30 14:31 ` Mark Rutland
2017-03-30 15:29 ` Mark Rutland
2017-03-30 15:29   ` Mark Rutland
2017-03-30 15:41   ` James Okken
2017-03-30 15:41   ` Marc Zyngier
2017-03-30 15:41     ` Marc Zyngier
2017-03-30 16:40     ` Suzuki K Poulose
2017-03-30 16:40       ` Suzuki K Poulose
2017-03-30 16:43       ` Suzuki K Poulose
2017-03-30 16:43         ` Suzuki K Poulose
2017-03-30 17:36       ` Suzuki K Poulose
2017-03-30 17:36         ` Suzuki K Poulose

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.