All of lore.kernel.org
 help / color / mirror / Atom feed
* KVM CPU hotplug notifier triggers BUG_ON on arm64
@ 2023-07-01 12:50 ` Kristina Martsenko
  0 siblings, 0 replies; 12+ messages in thread
From: Kristina Martsenko @ 2023-07-01 12:50 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton, isaku.yamahata, seanjc, pbonzini
  Cc: kvmarm, kvm, linux-arm-kernel, James Morse

Hi,

When I try to online a CPU on arm64 while a KVM guest is running, I hit a
BUG_ON(preemptible()) (as well as a WARN_ON). See below for the full log.

This is on kvmarm/next, but seems to have been broken since 6.3. Bisecting it
points at commit:

  0bf50497f03b ("KVM: Drop kvm_count_lock and instead protect kvm_usage_count with kvm_lock")

Thanks,
Kristina

-->8--

/ # /root/lkvm-static run /root/kimgs/Image -c 1 --console virtio -p "earlycon loglevel=9" -d /root/kvm-rootfs/ &
/ #   # lkvm run -k /root/kimgs/Image -m 256 -c 1 --name guest-112

/ # echo 0 > /sys/devices/system/cpu/cpu1/online
[ 2060.783263] psci: CPU1 killed (polled 0 ms)
/ # echo 1 > /sys/devices/system/cpu/cpu1/online
[ 2061.070582] Detected PIPT I-cache on CPU1
[ 2061.070800] GICv3: CPU1: found redistributor 100 region 0:0x000000002f120000
[ 2061.070985] CPU1: Booted secondary processor 0x0000000100 [0x410fd0f0]
[ 2061.071167] ------------[ cut here ]------------
[ 2061.071233] WARNING: CPU: 1 PID: 18 at arch/arm64/kernel/cpufeature.c:3228 this_cpu_has_cap+0x14/0x60
[ 2061.071403] Modules linked in:
[ 2061.071478] CPU: 1 PID: 18 Comm: cpuhp/1 Not tainted 6.4.0-rc3-00072-g192df2aa0113 #80
[ 2061.071606] Hardware name: FVP Base RevC (DT)
[ 2061.071678] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 2061.071807] pc : this_cpu_has_cap+0x14/0x60
[ 2061.071922] lr : cpu_hyp_init_context+0x100/0x168
[ 2061.072028] sp : ffff800082a4bd10
[ 2061.072091] x29: ffff800082a4bd10 x28: 0000000000000000 x27: 0000000000000000
[ 2061.072270] x26: 0000000000000000 x25: ffff8000801227c8 x24: 0000000000000001
[ 2061.072447] x23: ffff800081c94008 x22: ffff80008234c7d0 x21: 0000000000000001
[ 2061.072626] x20: 0000000000000001 x19: ffff800081c958b0 x18: 0000000000000006
[ 2061.072803] x17: 000000040044ffff x16: 00500075b5503510 x15: ffff800082a0b920
[ 2061.072984] x14: 0000000000000000 x13: ffff800082351aa0 x12: 00000000000004a4
[ 2061.073159] x11: 0000000000000001 x10: 0000000000000a60 x9 : ffff800082a4bce0
[ 2061.073337] x8 : ffff000800240ac0 x7 : ffff00087f768040 x6 : ffff80008234b010
[ 2061.073518] x5 : ffff00087f75f970 x4 : 0000000000000000 x3 : ffff80008012c140
[ 2061.073696] x2 : 0000000000000005 x1 : 0000000000000000 x0 : 0000000000000039
[ 2061.073868] Call trace:
[ 2061.073923]  this_cpu_has_cap+0x14/0x60
[ 2061.074038]  _kvm_arch_hardware_enable+0x48/0xa0
[ 2061.074148]  kvm_arch_hardware_enable+0x2c/0x60
[ 2061.074263]  __hardware_enable_nolock+0x40/0x78
[ 2061.074388]  kvm_online_cpu+0x4c/0x6c
[ 2061.074507]  cpuhp_invoke_callback+0x100/0x1f4
[ 2061.074631]  cpuhp_thread_fun+0xac/0x194
[ 2061.074754]  smpboot_thread_fn+0x224/0x248
[ 2061.074893]  kthread+0x114/0x118
[ 2061.074996]  ret_from_fork+0x10/0x20
[ 2061.075104] ---[ end trace 0000000000000000 ]---
[ 2061.075254] ------------[ cut here ]------------
[ 2061.075316] kernel BUG at arch/arm64/kvm/vgic/vgic-init.c:517!
[ 2061.075405] Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP
[ 2061.075503] Modules linked in:
[ 2061.075580] CPU: 1 PID: 18 Comm: cpuhp/1 Tainted: G        W          6.4.0-rc3-00072-g192df2aa0113 #80
[ 2061.075718] Hardware name: FVP Base RevC (DT)
[ 2061.075790] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 2061.075922] pc : kvm_vgic_init_cpu_hardware+0x80/0x84
[ 2061.076061] lr : _kvm_arch_hardware_enable+0x94/0xa0
[ 2061.076169] sp : ffff800082a4bd40
[ 2061.076236] x29: ffff800082a4bd40 x28: 0000000000000000 x27: 0000000000000000
[ 2061.076412] x26: 0000000000000000 x25: ffff8000801227c8 x24: 0000000000000001
[ 2061.076588] x23: ffff800081c94008 x22: ffff80008234c7d0 x21: 0000000000000001
[ 2061.076768] x20: 0000000000000001 x19: ffff800081c958b0 x18: 0000000000000006
[ 2061.076944] x17: 000000040044ffff x16: 00500075b5503510 x15: ffff800082a0b920
[ 2061.077126] x14: 0000000000000000 x13: ffff800082351aa0 x12: 00000000000004a4
[ 2061.077303] x11: 0000000000000001 x10: 0000000000000a60 x9 : ffff800082a4bce0
[ 2061.077479] x8 : ffff000800240ac0 x7 : ffff00087f768040 x6 : ffff80008234b010
[ 2061.077660] x5 : ffff00087f75f970 x4 : 0000000000000001 x3 : ffff800081ca1000
[ 2061.077838] x2 : ffff800081c958c0 x1 : 0000000000000008 x0 : 0000000000000000
[ 2061.078013] Call trace:
[ 2061.078068]  kvm_vgic_init_cpu_hardware+0x80/0x84
[ 2061.078209]  kvm_arch_hardware_enable+0x2c/0x60
[ 2061.078324]  __hardware_enable_nolock+0x40/0x78
[ 2061.078450]  kvm_online_cpu+0x4c/0x6c
[ 2061.078568]  cpuhp_invoke_callback+0x100/0x1f4
[ 2061.078692]  cpuhp_thread_fun+0xac/0x194
[ 2061.078816]  smpboot_thread_fn+0x224/0x248
[ 2061.078955]  kthread+0x114/0x118
[ 2061.079057]  ret_from_fork+0x10/0x20
[ 2061.079199] Code: d50323bf d65f03c0 d53b4220 373ffc80 (d4210000) 
[ 2061.079294] ---[ end trace 0000000000000000 ]---
[ 2061.288961] pstore: backend (efi_pstore) writing error (-5)
[ 2061.289043] note: cpuhp/1[18] exited with irqs disabled
[ 2061.289151] note: cpuhp/1[18] exited with preempt_count 1
[ 2061.289452] ------------[ cut here ]------------
[ 2061.289516] WARNING: CPU: 1 PID: 0 at kernel/context_tracking.c:128 ct_kernel_exit.constprop.0+0x98/0xa0
[ 2061.289712] Modules linked in:
[ 2061.289790] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G      D W          6.4.0-rc3-00072-g192df2aa0113 #80
[ 2061.289928] Hardware name: FVP Base RevC (DT)
[ 2061.290000] pstate: 200003c5 (nzCv DAIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 2061.290131] pc : ct_kernel_exit.constprop.0+0x98/0xa0
[ 2061.290275] lr : ct_idle_enter+0x10/0x1c
[ 2061.290407] sp : ffff800082a0bdd0
[ 2061.290473] x29: ffff800082a0bdd0 x28: 0000000000000000 x27: 0000000000000000
[ 2061.290650] x26: 0000000000000000 x25: ffff000800238000 x24: 0000000000000000
[ 2061.290826] x23: 0000000000000000 x22: ffff000800238000 x21: ffff800082339b48
[ 2061.291006] x20: ffff800082339a40 x19: ffff00087f764a18 x18: 0000000000000006
[ 2061.291185] x17: 0000000000000008 x16: ffff800082b7bff0 x15: ffff800082a4b4d0
[ 2061.291365] x14: 0000000000000059 x13: 0000000000000059 x12: 0000000000000001
[ 2061.291539] x11: 0000000000000001 x10: 0000000000000a60 x9 : ffff800082a0bd30
[ 2061.291715] x8 : ffff000800238ac0 x7 : 0000000000000000 x6 : 000000306bbc2709
[ 2061.291892] x5 : 4000000000000002 x4 : ffff8007fdac9000 x3 : ffff800082a0bdd0
[ 2061.292073] x2 : 4000000000000000 x1 : ffff800081c9ba18 x0 : ffff800081c9ba18
[ 2061.292255] Call trace:
[ 2061.292310]  ct_kernel_exit.constprop.0+0x98/0xa0
[ 2061.292455]  ct_idle_enter+0x10/0x1c
[ 2061.292589]  default_idle_call+0x1c/0x3c
[ 2061.292728]  do_idle+0x20c/0x264
[ 2061.292840]  cpu_startup_entry+0x24/0x2c
[ 2061.292958]  secondary_start_kernel+0x130/0x154
[ 2061.293090]  __secondary_switched+0xb8/0xbc
[ 2061.293214] ---[ end trace 0000000000000000 ]---


^ permalink raw reply	[flat|nested] 12+ messages in thread

* KVM CPU hotplug notifier triggers BUG_ON on arm64
@ 2023-07-01 12:50 ` Kristina Martsenko
  0 siblings, 0 replies; 12+ messages in thread
From: Kristina Martsenko @ 2023-07-01 12:50 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton, isaku.yamahata, seanjc, pbonzini
  Cc: kvmarm, kvm, linux-arm-kernel, James Morse

Hi,

When I try to online a CPU on arm64 while a KVM guest is running, I hit a
BUG_ON(preemptible()) (as well as a WARN_ON). See below for the full log.

This is on kvmarm/next, but seems to have been broken since 6.3. Bisecting it
points at commit:

  0bf50497f03b ("KVM: Drop kvm_count_lock and instead protect kvm_usage_count with kvm_lock")

Thanks,
Kristina

-->8--

/ # /root/lkvm-static run /root/kimgs/Image -c 1 --console virtio -p "earlycon loglevel=9" -d /root/kvm-rootfs/ &
/ #   # lkvm run -k /root/kimgs/Image -m 256 -c 1 --name guest-112

/ # echo 0 > /sys/devices/system/cpu/cpu1/online
[ 2060.783263] psci: CPU1 killed (polled 0 ms)
/ # echo 1 > /sys/devices/system/cpu/cpu1/online
[ 2061.070582] Detected PIPT I-cache on CPU1
[ 2061.070800] GICv3: CPU1: found redistributor 100 region 0:0x000000002f120000
[ 2061.070985] CPU1: Booted secondary processor 0x0000000100 [0x410fd0f0]
[ 2061.071167] ------------[ cut here ]------------
[ 2061.071233] WARNING: CPU: 1 PID: 18 at arch/arm64/kernel/cpufeature.c:3228 this_cpu_has_cap+0x14/0x60
[ 2061.071403] Modules linked in:
[ 2061.071478] CPU: 1 PID: 18 Comm: cpuhp/1 Not tainted 6.4.0-rc3-00072-g192df2aa0113 #80
[ 2061.071606] Hardware name: FVP Base RevC (DT)
[ 2061.071678] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 2061.071807] pc : this_cpu_has_cap+0x14/0x60
[ 2061.071922] lr : cpu_hyp_init_context+0x100/0x168
[ 2061.072028] sp : ffff800082a4bd10
[ 2061.072091] x29: ffff800082a4bd10 x28: 0000000000000000 x27: 0000000000000000
[ 2061.072270] x26: 0000000000000000 x25: ffff8000801227c8 x24: 0000000000000001
[ 2061.072447] x23: ffff800081c94008 x22: ffff80008234c7d0 x21: 0000000000000001
[ 2061.072626] x20: 0000000000000001 x19: ffff800081c958b0 x18: 0000000000000006
[ 2061.072803] x17: 000000040044ffff x16: 00500075b5503510 x15: ffff800082a0b920
[ 2061.072984] x14: 0000000000000000 x13: ffff800082351aa0 x12: 00000000000004a4
[ 2061.073159] x11: 0000000000000001 x10: 0000000000000a60 x9 : ffff800082a4bce0
[ 2061.073337] x8 : ffff000800240ac0 x7 : ffff00087f768040 x6 : ffff80008234b010
[ 2061.073518] x5 : ffff00087f75f970 x4 : 0000000000000000 x3 : ffff80008012c140
[ 2061.073696] x2 : 0000000000000005 x1 : 0000000000000000 x0 : 0000000000000039
[ 2061.073868] Call trace:
[ 2061.073923]  this_cpu_has_cap+0x14/0x60
[ 2061.074038]  _kvm_arch_hardware_enable+0x48/0xa0
[ 2061.074148]  kvm_arch_hardware_enable+0x2c/0x60
[ 2061.074263]  __hardware_enable_nolock+0x40/0x78
[ 2061.074388]  kvm_online_cpu+0x4c/0x6c
[ 2061.074507]  cpuhp_invoke_callback+0x100/0x1f4
[ 2061.074631]  cpuhp_thread_fun+0xac/0x194
[ 2061.074754]  smpboot_thread_fn+0x224/0x248
[ 2061.074893]  kthread+0x114/0x118
[ 2061.074996]  ret_from_fork+0x10/0x20
[ 2061.075104] ---[ end trace 0000000000000000 ]---
[ 2061.075254] ------------[ cut here ]------------
[ 2061.075316] kernel BUG at arch/arm64/kvm/vgic/vgic-init.c:517!
[ 2061.075405] Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP
[ 2061.075503] Modules linked in:
[ 2061.075580] CPU: 1 PID: 18 Comm: cpuhp/1 Tainted: G        W          6.4.0-rc3-00072-g192df2aa0113 #80
[ 2061.075718] Hardware name: FVP Base RevC (DT)
[ 2061.075790] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 2061.075922] pc : kvm_vgic_init_cpu_hardware+0x80/0x84
[ 2061.076061] lr : _kvm_arch_hardware_enable+0x94/0xa0
[ 2061.076169] sp : ffff800082a4bd40
[ 2061.076236] x29: ffff800082a4bd40 x28: 0000000000000000 x27: 0000000000000000
[ 2061.076412] x26: 0000000000000000 x25: ffff8000801227c8 x24: 0000000000000001
[ 2061.076588] x23: ffff800081c94008 x22: ffff80008234c7d0 x21: 0000000000000001
[ 2061.076768] x20: 0000000000000001 x19: ffff800081c958b0 x18: 0000000000000006
[ 2061.076944] x17: 000000040044ffff x16: 00500075b5503510 x15: ffff800082a0b920
[ 2061.077126] x14: 0000000000000000 x13: ffff800082351aa0 x12: 00000000000004a4
[ 2061.077303] x11: 0000000000000001 x10: 0000000000000a60 x9 : ffff800082a4bce0
[ 2061.077479] x8 : ffff000800240ac0 x7 : ffff00087f768040 x6 : ffff80008234b010
[ 2061.077660] x5 : ffff00087f75f970 x4 : 0000000000000001 x3 : ffff800081ca1000
[ 2061.077838] x2 : ffff800081c958c0 x1 : 0000000000000008 x0 : 0000000000000000
[ 2061.078013] Call trace:
[ 2061.078068]  kvm_vgic_init_cpu_hardware+0x80/0x84
[ 2061.078209]  kvm_arch_hardware_enable+0x2c/0x60
[ 2061.078324]  __hardware_enable_nolock+0x40/0x78
[ 2061.078450]  kvm_online_cpu+0x4c/0x6c
[ 2061.078568]  cpuhp_invoke_callback+0x100/0x1f4
[ 2061.078692]  cpuhp_thread_fun+0xac/0x194
[ 2061.078816]  smpboot_thread_fn+0x224/0x248
[ 2061.078955]  kthread+0x114/0x118
[ 2061.079057]  ret_from_fork+0x10/0x20
[ 2061.079199] Code: d50323bf d65f03c0 d53b4220 373ffc80 (d4210000) 
[ 2061.079294] ---[ end trace 0000000000000000 ]---
[ 2061.288961] pstore: backend (efi_pstore) writing error (-5)
[ 2061.289043] note: cpuhp/1[18] exited with irqs disabled
[ 2061.289151] note: cpuhp/1[18] exited with preempt_count 1
[ 2061.289452] ------------[ cut here ]------------
[ 2061.289516] WARNING: CPU: 1 PID: 0 at kernel/context_tracking.c:128 ct_kernel_exit.constprop.0+0x98/0xa0
[ 2061.289712] Modules linked in:
[ 2061.289790] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G      D W          6.4.0-rc3-00072-g192df2aa0113 #80
[ 2061.289928] Hardware name: FVP Base RevC (DT)
[ 2061.290000] pstate: 200003c5 (nzCv DAIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 2061.290131] pc : ct_kernel_exit.constprop.0+0x98/0xa0
[ 2061.290275] lr : ct_idle_enter+0x10/0x1c
[ 2061.290407] sp : ffff800082a0bdd0
[ 2061.290473] x29: ffff800082a0bdd0 x28: 0000000000000000 x27: 0000000000000000
[ 2061.290650] x26: 0000000000000000 x25: ffff000800238000 x24: 0000000000000000
[ 2061.290826] x23: 0000000000000000 x22: ffff000800238000 x21: ffff800082339b48
[ 2061.291006] x20: ffff800082339a40 x19: ffff00087f764a18 x18: 0000000000000006
[ 2061.291185] x17: 0000000000000008 x16: ffff800082b7bff0 x15: ffff800082a4b4d0
[ 2061.291365] x14: 0000000000000059 x13: 0000000000000059 x12: 0000000000000001
[ 2061.291539] x11: 0000000000000001 x10: 0000000000000a60 x9 : ffff800082a0bd30
[ 2061.291715] x8 : ffff000800238ac0 x7 : 0000000000000000 x6 : 000000306bbc2709
[ 2061.291892] x5 : 4000000000000002 x4 : ffff8007fdac9000 x3 : ffff800082a0bdd0
[ 2061.292073] x2 : 4000000000000000 x1 : ffff800081c9ba18 x0 : ffff800081c9ba18
[ 2061.292255] Call trace:
[ 2061.292310]  ct_kernel_exit.constprop.0+0x98/0xa0
[ 2061.292455]  ct_idle_enter+0x10/0x1c
[ 2061.292589]  default_idle_call+0x1c/0x3c
[ 2061.292728]  do_idle+0x20c/0x264
[ 2061.292840]  cpu_startup_entry+0x24/0x2c
[ 2061.292958]  secondary_start_kernel+0x130/0x154
[ 2061.293090]  __secondary_switched+0xb8/0xbc
[ 2061.293214] ---[ end trace 0000000000000000 ]---


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: KVM CPU hotplug notifier triggers BUG_ON on arm64
  2023-07-01 12:50 ` Kristina Martsenko
@ 2023-07-01 17:42   ` Oliver Upton
  -1 siblings, 0 replies; 12+ messages in thread
From: Oliver Upton @ 2023-07-01 17:42 UTC (permalink / raw)
  To: Kristina Martsenko
  Cc: Marc Zyngier, isaku.yamahata, seanjc, pbonzini, kvmarm, kvm,
	linux-arm-kernel, James Morse

Hi Kristina,

Thanks for the bug report.

On Sat, Jul 01, 2023 at 01:50:52PM +0100, Kristina Martsenko wrote:
> Hi,
> 
> When I try to online a CPU on arm64 while a KVM guest is running, I hit a
> BUG_ON(preemptible()) (as well as a WARN_ON). See below for the full log.
> 
> This is on kvmarm/next, but seems to have been broken since 6.3. Bisecting it
> points at commit:
> 
>   0bf50497f03b ("KVM: Drop kvm_count_lock and instead protect kvm_usage_count with kvm_lock")

Makes sense. We were using a spinlock before, which implictly disables
preemption.

Well, one way to hack around the problem would be to just cram
preempt_{disable,enable}() into kvm_arch_hardware_disable(), but that's
kinda gross in the context of cpuhp which isn't migratable in the first
place. Let me have a look...

--
Thanks,
Oliver

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: KVM CPU hotplug notifier triggers BUG_ON on arm64
@ 2023-07-01 17:42   ` Oliver Upton
  0 siblings, 0 replies; 12+ messages in thread
From: Oliver Upton @ 2023-07-01 17:42 UTC (permalink / raw)
  To: Kristina Martsenko
  Cc: Marc Zyngier, isaku.yamahata, seanjc, pbonzini, kvmarm, kvm,
	linux-arm-kernel, James Morse

Hi Kristina,

Thanks for the bug report.

On Sat, Jul 01, 2023 at 01:50:52PM +0100, Kristina Martsenko wrote:
> Hi,
> 
> When I try to online a CPU on arm64 while a KVM guest is running, I hit a
> BUG_ON(preemptible()) (as well as a WARN_ON). See below for the full log.
> 
> This is on kvmarm/next, but seems to have been broken since 6.3. Bisecting it
> points at commit:
> 
>   0bf50497f03b ("KVM: Drop kvm_count_lock and instead protect kvm_usage_count with kvm_lock")

Makes sense. We were using a spinlock before, which implictly disables
preemption.

Well, one way to hack around the problem would be to just cram
preempt_{disable,enable}() into kvm_arch_hardware_disable(), but that's
kinda gross in the context of cpuhp which isn't migratable in the first
place. Let me have a look...

--
Thanks,
Oliver

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: KVM CPU hotplug notifier triggers BUG_ON on arm64
  2023-07-01 17:42   ` Oliver Upton
@ 2023-07-03  9:45     ` Marc Zyngier
  -1 siblings, 0 replies; 12+ messages in thread
From: Marc Zyngier @ 2023-07-03  9:45 UTC (permalink / raw)
  To: Oliver Upton, Kristina Martsenko
  Cc: isaku.yamahata, seanjc, pbonzini, kvmarm, kvm, linux-arm-kernel,
	James Morse

On Sat, 01 Jul 2023 18:42:28 +0100,
Oliver Upton <oliver.upton@linux.dev> wrote:
> 
> Hi Kristina,
> 
> Thanks for the bug report.
> 
> On Sat, Jul 01, 2023 at 01:50:52PM +0100, Kristina Martsenko wrote:
> > Hi,
> > 
> > When I try to online a CPU on arm64 while a KVM guest is running, I hit a
> > BUG_ON(preemptible()) (as well as a WARN_ON). See below for the full log.
> > 
> > This is on kvmarm/next, but seems to have been broken since 6.3. Bisecting it
> > points at commit:
> > 
> >   0bf50497f03b ("KVM: Drop kvm_count_lock and instead protect kvm_usage_count with kvm_lock")
> 
> Makes sense. We were using a spinlock before, which implictly disables
> preemption.
> 
> Well, one way to hack around the problem would be to just cram
> preempt_{disable,enable}() into kvm_arch_hardware_disable(), but that's
> kinda gross in the context of cpuhp which isn't migratable in the first
> place. Let me have a look...

An alternative would be to replace the preemptible() checks with a one
that looks at the migration state, but I'm not sure that's much better
(it certainly looks more costly).

There is also the fact that most of our per-CPU accessors are already
using preemption disabling, and this code has a bunch of them. So I'm
not sure there is a lot to be gained from not disabling preemption
upfront.

Anyway, as I was able to reproduce the issue under NV, I tested the
hack below. If anything, I expect it to be a reasonable fix for
6.3/6.4, and until we come up with a better approach.

Thanks,

	M.

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index aaeae1145359..a28c4ffe4932 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1894,8 +1894,17 @@ static void _kvm_arch_hardware_enable(void *discard)
 
 int kvm_arch_hardware_enable(void)
 {
-	int was_enabled = __this_cpu_read(kvm_arm_hardware_enabled);
+	int was_enabled;
 
+	/*
+	 * Most calls to this function are made with migration
+	 * disabled, but not with preemption disabled. The former is
+	 * enough to ensure correctness, but most of the helpers
+	 * expect the later and will throw a tantrum otherwise.
+	 */
+	preempt_disable();
+
+	was_enabled = __this_cpu_read(kvm_arm_hardware_enabled);
 	_kvm_arch_hardware_enable(NULL);
 
 	if (!was_enabled) {
@@ -1903,6 +1912,8 @@ int kvm_arch_hardware_enable(void)
 		kvm_timer_cpu_up();
 	}
 
+	preempt_enable();
+
 	return 0;
 }
 



-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: KVM CPU hotplug notifier triggers BUG_ON on arm64
@ 2023-07-03  9:45     ` Marc Zyngier
  0 siblings, 0 replies; 12+ messages in thread
From: Marc Zyngier @ 2023-07-03  9:45 UTC (permalink / raw)
  To: Oliver Upton, Kristina Martsenko
  Cc: isaku.yamahata, seanjc, pbonzini, kvmarm, kvm, linux-arm-kernel,
	James Morse

On Sat, 01 Jul 2023 18:42:28 +0100,
Oliver Upton <oliver.upton@linux.dev> wrote:
> 
> Hi Kristina,
> 
> Thanks for the bug report.
> 
> On Sat, Jul 01, 2023 at 01:50:52PM +0100, Kristina Martsenko wrote:
> > Hi,
> > 
> > When I try to online a CPU on arm64 while a KVM guest is running, I hit a
> > BUG_ON(preemptible()) (as well as a WARN_ON). See below for the full log.
> > 
> > This is on kvmarm/next, but seems to have been broken since 6.3. Bisecting it
> > points at commit:
> > 
> >   0bf50497f03b ("KVM: Drop kvm_count_lock and instead protect kvm_usage_count with kvm_lock")
> 
> Makes sense. We were using a spinlock before, which implictly disables
> preemption.
> 
> Well, one way to hack around the problem would be to just cram
> preempt_{disable,enable}() into kvm_arch_hardware_disable(), but that's
> kinda gross in the context of cpuhp which isn't migratable in the first
> place. Let me have a look...

An alternative would be to replace the preemptible() checks with a one
that looks at the migration state, but I'm not sure that's much better
(it certainly looks more costly).

There is also the fact that most of our per-CPU accessors are already
using preemption disabling, and this code has a bunch of them. So I'm
not sure there is a lot to be gained from not disabling preemption
upfront.

Anyway, as I was able to reproduce the issue under NV, I tested the
hack below. If anything, I expect it to be a reasonable fix for
6.3/6.4, and until we come up with a better approach.

Thanks,

	M.

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index aaeae1145359..a28c4ffe4932 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1894,8 +1894,17 @@ static void _kvm_arch_hardware_enable(void *discard)
 
 int kvm_arch_hardware_enable(void)
 {
-	int was_enabled = __this_cpu_read(kvm_arm_hardware_enabled);
+	int was_enabled;
 
+	/*
+	 * Most calls to this function are made with migration
+	 * disabled, but not with preemption disabled. The former is
+	 * enough to ensure correctness, but most of the helpers
+	 * expect the later and will throw a tantrum otherwise.
+	 */
+	preempt_disable();
+
+	was_enabled = __this_cpu_read(kvm_arm_hardware_enabled);
 	_kvm_arch_hardware_enable(NULL);
 
 	if (!was_enabled) {
@@ -1903,6 +1912,8 @@ int kvm_arch_hardware_enable(void)
 		kvm_timer_cpu_up();
 	}
 
+	preempt_enable();
+
 	return 0;
 }
 



-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: KVM CPU hotplug notifier triggers BUG_ON on arm64
  2023-07-03  9:45     ` Marc Zyngier
@ 2023-07-03 10:36       ` Kristina Martsenko
  -1 siblings, 0 replies; 12+ messages in thread
From: Kristina Martsenko @ 2023-07-03 10:36 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton
  Cc: isaku.yamahata, seanjc, pbonzini, kvmarm, kvm, linux-arm-kernel,
	James Morse

On 03/07/2023 10:45, Marc Zyngier wrote:
> On Sat, 01 Jul 2023 18:42:28 +0100,
> Oliver Upton <oliver.upton@linux.dev> wrote:
>>
>> Hi Kristina,
>>
>> Thanks for the bug report.
>>
>> On Sat, Jul 01, 2023 at 01:50:52PM +0100, Kristina Martsenko wrote:
>>> Hi,
>>>
>>> When I try to online a CPU on arm64 while a KVM guest is running, I hit a
>>> BUG_ON(preemptible()) (as well as a WARN_ON). See below for the full log.
>>>
>>> This is on kvmarm/next, but seems to have been broken since 6.3. Bisecting it
>>> points at commit:
>>>
>>>   0bf50497f03b ("KVM: Drop kvm_count_lock and instead protect kvm_usage_count with kvm_lock")
>>
>> Makes sense. We were using a spinlock before, which implictly disables
>> preemption.
>>
>> Well, one way to hack around the problem would be to just cram
>> preempt_{disable,enable}() into kvm_arch_hardware_disable(), but that's
>> kinda gross in the context of cpuhp which isn't migratable in the first
>> place. Let me have a look...
> 
> An alternative would be to replace the preemptible() checks with a one
> that looks at the migration state, but I'm not sure that's much better
> (it certainly looks more costly).
> 
> There is also the fact that most of our per-CPU accessors are already
> using preemption disabling, and this code has a bunch of them. So I'm
> not sure there is a lot to be gained from not disabling preemption
> upfront.
> 
> Anyway, as I was able to reproduce the issue under NV, I tested the
> hack below. If anything, I expect it to be a reasonable fix for
> 6.3/6.4, and until we come up with a better approach.
> 
> Thanks,
> 
> 	M.
> 
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index aaeae1145359..a28c4ffe4932 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -1894,8 +1894,17 @@ static void _kvm_arch_hardware_enable(void *discard)
>  
>  int kvm_arch_hardware_enable(void)
>  {
> -	int was_enabled = __this_cpu_read(kvm_arm_hardware_enabled);
> +	int was_enabled;
>  
> +	/*
> +	 * Most calls to this function are made with migration
> +	 * disabled, but not with preemption disabled. The former is
> +	 * enough to ensure correctness, but most of the helpers
> +	 * expect the later and will throw a tantrum otherwise.
> +	 */
> +	preempt_disable();
> +
> +	was_enabled = __this_cpu_read(kvm_arm_hardware_enabled);
>  	_kvm_arch_hardware_enable(NULL);
>  
>  	if (!was_enabled) {
> @@ -1903,6 +1912,8 @@ int kvm_arch_hardware_enable(void)
>  		kvm_timer_cpu_up();
>  	}
>  
> +	preempt_enable();
> +
>  	return 0;
>  }

This fixes the issue for me.

Tested-by: Kristina Martsenko <kristina.martsenko@arm.com>

Thanks,
Kristina


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: KVM CPU hotplug notifier triggers BUG_ON on arm64
@ 2023-07-03 10:36       ` Kristina Martsenko
  0 siblings, 0 replies; 12+ messages in thread
From: Kristina Martsenko @ 2023-07-03 10:36 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton
  Cc: isaku.yamahata, seanjc, pbonzini, kvmarm, kvm, linux-arm-kernel,
	James Morse

On 03/07/2023 10:45, Marc Zyngier wrote:
> On Sat, 01 Jul 2023 18:42:28 +0100,
> Oliver Upton <oliver.upton@linux.dev> wrote:
>>
>> Hi Kristina,
>>
>> Thanks for the bug report.
>>
>> On Sat, Jul 01, 2023 at 01:50:52PM +0100, Kristina Martsenko wrote:
>>> Hi,
>>>
>>> When I try to online a CPU on arm64 while a KVM guest is running, I hit a
>>> BUG_ON(preemptible()) (as well as a WARN_ON). See below for the full log.
>>>
>>> This is on kvmarm/next, but seems to have been broken since 6.3. Bisecting it
>>> points at commit:
>>>
>>>   0bf50497f03b ("KVM: Drop kvm_count_lock and instead protect kvm_usage_count with kvm_lock")
>>
>> Makes sense. We were using a spinlock before, which implictly disables
>> preemption.
>>
>> Well, one way to hack around the problem would be to just cram
>> preempt_{disable,enable}() into kvm_arch_hardware_disable(), but that's
>> kinda gross in the context of cpuhp which isn't migratable in the first
>> place. Let me have a look...
> 
> An alternative would be to replace the preemptible() checks with a one
> that looks at the migration state, but I'm not sure that's much better
> (it certainly looks more costly).
> 
> There is also the fact that most of our per-CPU accessors are already
> using preemption disabling, and this code has a bunch of them. So I'm
> not sure there is a lot to be gained from not disabling preemption
> upfront.
> 
> Anyway, as I was able to reproduce the issue under NV, I tested the
> hack below. If anything, I expect it to be a reasonable fix for
> 6.3/6.4, and until we come up with a better approach.
> 
> Thanks,
> 
> 	M.
> 
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index aaeae1145359..a28c4ffe4932 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -1894,8 +1894,17 @@ static void _kvm_arch_hardware_enable(void *discard)
>  
>  int kvm_arch_hardware_enable(void)
>  {
> -	int was_enabled = __this_cpu_read(kvm_arm_hardware_enabled);
> +	int was_enabled;
>  
> +	/*
> +	 * Most calls to this function are made with migration
> +	 * disabled, but not with preemption disabled. The former is
> +	 * enough to ensure correctness, but most of the helpers
> +	 * expect the later and will throw a tantrum otherwise.
> +	 */
> +	preempt_disable();
> +
> +	was_enabled = __this_cpu_read(kvm_arm_hardware_enabled);
>  	_kvm_arch_hardware_enable(NULL);
>  
>  	if (!was_enabled) {
> @@ -1903,6 +1912,8 @@ int kvm_arch_hardware_enable(void)
>  		kvm_timer_cpu_up();
>  	}
>  
> +	preempt_enable();
> +
>  	return 0;
>  }

This fixes the issue for me.

Tested-by: Kristina Martsenko <kristina.martsenko@arm.com>

Thanks,
Kristina


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: KVM CPU hotplug notifier triggers BUG_ON on arm64
  2023-07-03  9:45     ` Marc Zyngier
@ 2023-07-03 16:02       ` Oliver Upton
  -1 siblings, 0 replies; 12+ messages in thread
From: Oliver Upton @ 2023-07-03 16:02 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Kristina Martsenko, isaku.yamahata, seanjc, pbonzini, kvmarm,
	kvm, linux-arm-kernel, James Morse

Hey Marc,

On Mon, Jul 03, 2023 at 10:45:26AM +0100, Marc Zyngier wrote:
> On Sat, 01 Jul 2023 18:42:28 +0100, Oliver Upton <oliver.upton@linux.dev> wrote:
> > Well, one way to hack around the problem would be to just cram
> > preempt_{disable,enable}() into kvm_arch_hardware_disable(), but that's
> > kinda gross in the context of cpuhp which isn't migratable in the first
> > place. Let me have a look...

Heh, I should've mentioned I'm on holiday until Thursday.

> An alternative would be to replace the preemptible() checks with a one
> that looks at the migration state, but I'm not sure that's much better
> (it certainly looks more costly).
> 
> There is also the fact that most of our per-CPU accessors are already
> using preemption disabling, and this code has a bunch of them. So I'm
> not sure there is a lot to be gained from not disabling preemption
> upfront.
> 
> Anyway, as I was able to reproduce the issue under NV, I tested the
> hack below. If anything, I expect it to be a reasonable fix for
> 6.3/6.4, and until we come up with a better approach.

Yeah, I'm fine with a hack like this. Do you want to send this out as a
patch?

--
Thanks,
Oliver

> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index aaeae1145359..a28c4ffe4932 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -1894,8 +1894,17 @@ static void _kvm_arch_hardware_enable(void *discard)
>  
>  int kvm_arch_hardware_enable(void)
>  {
> -	int was_enabled = __this_cpu_read(kvm_arm_hardware_enabled);
> +	int was_enabled;
>  
> +	/*
> +	 * Most calls to this function are made with migration
> +	 * disabled, but not with preemption disabled. The former is
> +	 * enough to ensure correctness, but most of the helpers
> +	 * expect the later and will throw a tantrum otherwise.
> +	 */
> +	preempt_disable();
> +
> +	was_enabled = __this_cpu_read(kvm_arm_hardware_enabled);
>  	_kvm_arch_hardware_enable(NULL);
>  
>  	if (!was_enabled) {
> @@ -1903,6 +1912,8 @@ int kvm_arch_hardware_enable(void)
>  		kvm_timer_cpu_up();
>  	}
>  
> +	preempt_enable();
> +
>  	return 0;
>  }
>  
> 
> 
> 
> -- 
> Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: KVM CPU hotplug notifier triggers BUG_ON on arm64
@ 2023-07-03 16:02       ` Oliver Upton
  0 siblings, 0 replies; 12+ messages in thread
From: Oliver Upton @ 2023-07-03 16:02 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Kristina Martsenko, isaku.yamahata, seanjc, pbonzini, kvmarm,
	kvm, linux-arm-kernel, James Morse

Hey Marc,

On Mon, Jul 03, 2023 at 10:45:26AM +0100, Marc Zyngier wrote:
> On Sat, 01 Jul 2023 18:42:28 +0100, Oliver Upton <oliver.upton@linux.dev> wrote:
> > Well, one way to hack around the problem would be to just cram
> > preempt_{disable,enable}() into kvm_arch_hardware_disable(), but that's
> > kinda gross in the context of cpuhp which isn't migratable in the first
> > place. Let me have a look...

Heh, I should've mentioned I'm on holiday until Thursday.

> An alternative would be to replace the preemptible() checks with a one
> that looks at the migration state, but I'm not sure that's much better
> (it certainly looks more costly).
> 
> There is also the fact that most of our per-CPU accessors are already
> using preemption disabling, and this code has a bunch of them. So I'm
> not sure there is a lot to be gained from not disabling preemption
> upfront.
> 
> Anyway, as I was able to reproduce the issue under NV, I tested the
> hack below. If anything, I expect it to be a reasonable fix for
> 6.3/6.4, and until we come up with a better approach.

Yeah, I'm fine with a hack like this. Do you want to send this out as a
patch?

--
Thanks,
Oliver

> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index aaeae1145359..a28c4ffe4932 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -1894,8 +1894,17 @@ static void _kvm_arch_hardware_enable(void *discard)
>  
>  int kvm_arch_hardware_enable(void)
>  {
> -	int was_enabled = __this_cpu_read(kvm_arm_hardware_enabled);
> +	int was_enabled;
>  
> +	/*
> +	 * Most calls to this function are made with migration
> +	 * disabled, but not with preemption disabled. The former is
> +	 * enough to ensure correctness, but most of the helpers
> +	 * expect the later and will throw a tantrum otherwise.
> +	 */
> +	preempt_disable();
> +
> +	was_enabled = __this_cpu_read(kvm_arm_hardware_enabled);
>  	_kvm_arch_hardware_enable(NULL);
>  
>  	if (!was_enabled) {
> @@ -1903,6 +1912,8 @@ int kvm_arch_hardware_enable(void)
>  		kvm_timer_cpu_up();
>  	}
>  
> +	preempt_enable();
> +
>  	return 0;
>  }
>  
> 
> 
> 
> -- 
> Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: KVM CPU hotplug notifier triggers BUG_ON on arm64
  2023-07-03 16:02       ` Oliver Upton
@ 2023-07-03 16:38         ` Marc Zyngier
  -1 siblings, 0 replies; 12+ messages in thread
From: Marc Zyngier @ 2023-07-03 16:38 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Kristina Martsenko, isaku.yamahata, seanjc, pbonzini, kvmarm,
	kvm, linux-arm-kernel, James Morse

On Mon, 03 Jul 2023 17:02:30 +0100,
Oliver Upton <oliver.upton@linux.dev> wrote:
> 
> Hey Marc,
> 
> On Mon, Jul 03, 2023 at 10:45:26AM +0100, Marc Zyngier wrote:
> > On Sat, 01 Jul 2023 18:42:28 +0100, Oliver Upton <oliver.upton@linux.dev> wrote:
> > > Well, one way to hack around the problem would be to just cram
> > > preempt_{disable,enable}() into kvm_arch_hardware_disable(), but that's
> > > kinda gross in the context of cpuhp which isn't migratable in the first
> > > place. Let me have a look...
> 
> Heh, I should've mentioned I'm on holiday until Thursday.

No problem, happy to keep an eye on stuff in the meantime.

> 
> > An alternative would be to replace the preemptible() checks with a one
> > that looks at the migration state, but I'm not sure that's much better
> > (it certainly looks more costly).
> > 
> > There is also the fact that most of our per-CPU accessors are already
> > using preemption disabling, and this code has a bunch of them. So I'm
> > not sure there is a lot to be gained from not disabling preemption
> > upfront.
> > 
> > Anyway, as I was able to reproduce the issue under NV, I tested the
> > hack below. If anything, I expect it to be a reasonable fix for
> > 6.3/6.4, and until we come up with a better approach.
> 
> Yeah, I'm fine with a hack like this. Do you want to send this out as a
> patch?

Now sent as 20230703163548.1498943-1-maz@kernel.org.

Enjoy your time off!

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: KVM CPU hotplug notifier triggers BUG_ON on arm64
@ 2023-07-03 16:38         ` Marc Zyngier
  0 siblings, 0 replies; 12+ messages in thread
From: Marc Zyngier @ 2023-07-03 16:38 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Kristina Martsenko, isaku.yamahata, seanjc, pbonzini, kvmarm,
	kvm, linux-arm-kernel, James Morse

On Mon, 03 Jul 2023 17:02:30 +0100,
Oliver Upton <oliver.upton@linux.dev> wrote:
> 
> Hey Marc,
> 
> On Mon, Jul 03, 2023 at 10:45:26AM +0100, Marc Zyngier wrote:
> > On Sat, 01 Jul 2023 18:42:28 +0100, Oliver Upton <oliver.upton@linux.dev> wrote:
> > > Well, one way to hack around the problem would be to just cram
> > > preempt_{disable,enable}() into kvm_arch_hardware_disable(), but that's
> > > kinda gross in the context of cpuhp which isn't migratable in the first
> > > place. Let me have a look...
> 
> Heh, I should've mentioned I'm on holiday until Thursday.

No problem, happy to keep an eye on stuff in the meantime.

> 
> > An alternative would be to replace the preemptible() checks with a one
> > that looks at the migration state, but I'm not sure that's much better
> > (it certainly looks more costly).
> > 
> > There is also the fact that most of our per-CPU accessors are already
> > using preemption disabling, and this code has a bunch of them. So I'm
> > not sure there is a lot to be gained from not disabling preemption
> > upfront.
> > 
> > Anyway, as I was able to reproduce the issue under NV, I tested the
> > hack below. If anything, I expect it to be a reasonable fix for
> > 6.3/6.4, and until we come up with a better approach.
> 
> Yeah, I'm fine with a hack like this. Do you want to send this out as a
> patch?

Now sent as 20230703163548.1498943-1-maz@kernel.org.

Enjoy your time off!

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2023-07-03 16:38 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-01 12:50 KVM CPU hotplug notifier triggers BUG_ON on arm64 Kristina Martsenko
2023-07-01 12:50 ` Kristina Martsenko
2023-07-01 17:42 ` Oliver Upton
2023-07-01 17:42   ` Oliver Upton
2023-07-03  9:45   ` Marc Zyngier
2023-07-03  9:45     ` Marc Zyngier
2023-07-03 10:36     ` Kristina Martsenko
2023-07-03 10:36       ` Kristina Martsenko
2023-07-03 16:02     ` Oliver Upton
2023-07-03 16:02       ` Oliver Upton
2023-07-03 16:38       ` Marc Zyngier
2023-07-03 16:38         ` Marc Zyngier

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.