kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [Bug 216388] New: On Host, kernel errors in KVM, on guests, it shows CPU stalls
@ 2022-08-21  7:37 bugzilla-daemon
  2022-08-22 17:50 ` Sean Christopherson
                   ` (22 more replies)
  0 siblings, 23 replies; 27+ messages in thread
From: bugzilla-daemon @ 2022-08-21  7:37 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=216388

            Bug ID: 216388
           Summary: On Host, kernel errors in KVM, on guests, it shows CPU
                    stalls
           Product: Virtualization
           Version: unspecified
    Kernel Version: 5.19.0 / 5.19.1 / 5.19.2
          Hardware: All
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: high
          Priority: P1
         Component: kvm
          Assignee: virtualization_kvm@kernel-bugs.osdl.org
          Reporter: nanook@eskimo.com
        Regression: No

Created attachment 301614
  --> https://bugzilla.kernel.org/attachment.cgi?id=301614&action=edit
The configuration file used to Comile this kernel.

This behavior has persisted across 5.19.0, 5.19.1, and 5.19.2.  While the
kernel I am taking this example from is tainted (owing to using Intel
development drivers for GPU virtualization), it is also occurring on
non-tainted kernels on servers with no development or third party modules
installed.

INFO: task CPU 2/KVM:2343 blocked for more than 1228 seconds.
[207177.050049]       Tainted: G     U    I       5.19.2 #1
[207177.050050] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
[207177.050051] task:CPU 2/KVM       state:D stack:    0 pid: 2343 ppid:     1
flags:0x00000002
[207177.050054] Call Trace:
[207177.050055]  <TASK>
[207177.050056]  __schedule+0x359/0x1400
[207177.050060]  ? kvm_mmu_page_fault+0x1ee/0x980
[207177.050062]  ? kvm_set_msr_common+0x31f/0x1060
[207177.050065]  schedule+0x5f/0x100
[207177.050066]  schedule_preempt_disabled+0x15/0x30
[207177.050068]  __mutex_lock.constprop.0+0x4e2/0x750
[207177.050070]  ? aa_file_perm+0x124/0x4f0
[207177.050071]  __mutex_lock_slowpath+0x13/0x20
[207177.050072]  mutex_lock+0x25/0x30
[207177.050075]  intel_vgpu_emulate_mmio_read+0x5d/0x3b0 [kvmgt]
[207177.050084]  intel_vgpu_rw+0xb8/0x1c0 [kvmgt]
[207177.050091]  intel_vgpu_read+0x20d/0x250 [kvmgt]
[207177.050097]  vfio_device_fops_read+0x1f/0x40
[207177.050100]  vfs_read+0x9b/0x160
[207177.050102]  __x64_sys_pread64+0x93/0xd0
[207177.050104]  do_syscall_64+0x58/0x80
[207177.050106]  ? kvm_on_user_return+0x84/0xe0
[207177.050107]  ? fire_user_return_notifiers+0x37/0x70
[207177.050109]  ? exit_to_user_mode_prepare+0x41/0x200
[207177.050111]  ? syscall_exit_to_user_mode+0x1b/0x40
[207177.050112]  ? do_syscall_64+0x67/0x80
[207177.050114]  ? irqentry_exit+0x54/0x70
[207177.050115]  ? sysvec_call_function_single+0x4b/0xa0
[207177.050116]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
[207177.050118] RIP: 0033:0x7ff51131293f
[207177.050119] RSP: 002b:00007ff4ddffa260 EFLAGS: 00000293 ORIG_RAX:
0000000000000011
[207177.050121] RAX: ffffffffffffffda RBX: 00005599a6835420 RCX:
00007ff51131293f
[207177.050122] RDX: 0000000000000004 RSI: 00007ff4ddffa2a8 RDI:
0000000000000027
[207177.050123] RBP: 0000000000000004 R08: 0000000000000000 R09:
00000000ffffffff
[207177.050124] R10: 0000000000065f10 R11: 0000000000000293 R12:
0000000000065f10
[207177.050124] R13: 00005599a6835330 R14: 0000000000000004 R15:
0000000000065f10
[207177.050126]  </TASK>

     I am seeing this on Intel i7-6700k, i7-6850k, and i7-9700k platforms.

     This did not happen on 5.17 kernels, and 5.18 kernels never ran stable
enough on my platforms to actually run them for more than a few minutes.

     Likewise 6.0-rc1 has not been stable enough to run in production.  After
less than three hours running on my workstation it locked hard with even the
magic sys-request key being unresponsive and only power cycling the machine got
it back.

     The operating system in use for the host on all machines is Ubuntu 22.04.

     Guests vary with Ubuntu 22.04 being the most common but also Mint, Debian,
Manjaro, Centos, Fedora, ScientificLinux, Zorin, and Windows being in use.

     I see the same issue manifest on platforms running only Ubuntu guests as
with guests of varying operating systems.  

     The configuration file I used to compile this kernel is attached.  I
compiled it with gcc 12.1.0.

     This behavior does not manifest itself instantly, typically the machine
needs to be running 3-7 days before it does.  Once it does guests keep stalling
and restarting libvirtd does not help.  Only thing that seems to is a hard
reboot of the physical host.  For this reason I believe the issue lies strictly
with the host and not the guests.

     I have listed it as a severity of high since it is completely service
interrupting.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Bug 216388] New: On Host, kernel errors in KVM, on guests, it shows CPU stalls
  2022-08-21  7:37 [Bug 216388] New: On Host, kernel errors in KVM, on guests, it shows CPU stalls bugzilla-daemon
@ 2022-08-22 17:50 ` Sean Christopherson
  2022-08-22 23:21   ` Zhenyu Wang
  2022-08-22 17:50 ` [Bug 216388] " bugzilla-daemon
                   ` (21 subsequent siblings)
  22 siblings, 1 reply; 27+ messages in thread
From: Sean Christopherson @ 2022-08-22 17:50 UTC (permalink / raw)
  To: bugzilla-daemon; +Cc: kvm, Zhenyu Wang, Zhi Wang, intel-gvt-dev, intel-gfx

+GVT folks

On Sun, Aug 21, 2022, bugzilla-daemon@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=216388
> 
>             Bug ID: 216388
>            Summary: On Host, kernel errors in KVM, on guests, it shows CPU
>                     stalls
>            Product: Virtualization
>            Version: unspecified
>     Kernel Version: 5.19.0 / 5.19.1 / 5.19.2
>           Hardware: All
>                 OS: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: high
>           Priority: P1
>          Component: kvm
>           Assignee: virtualization_kvm@kernel-bugs.osdl.org
>           Reporter: nanook@eskimo.com
>         Regression: No
> 
> Created attachment 301614
>   --> https://bugzilla.kernel.org/attachment.cgi?id=301614&action=edit
> The configuration file used to Comile this kernel.
> 
> This behavior has persisted across 5.19.0, 5.19.1, and 5.19.2.  While the
> kernel I am taking this example from is tainted (owing to using Intel
> development drivers for GPU virtualization), it is also occurring on
> non-tainted kernels on servers with no development or third party modules
> installed.
> 
> INFO: task CPU 2/KVM:2343 blocked for more than 1228 seconds.
> [207177.050049]       Tainted: G     U    I       5.19.2 #1
> [207177.050050] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [207177.050051] task:CPU 2/KVM       state:D stack:    0 pid: 2343 ppid:     1
> flags:0x00000002
> [207177.050054] Call Trace:
> [207177.050055]  <TASK>
> [207177.050056]  __schedule+0x359/0x1400
> [207177.050060]  ? kvm_mmu_page_fault+0x1ee/0x980
> [207177.050062]  ? kvm_set_msr_common+0x31f/0x1060
> [207177.050065]  schedule+0x5f/0x100
> [207177.050066]  schedule_preempt_disabled+0x15/0x30
> [207177.050068]  __mutex_lock.constprop.0+0x4e2/0x750
> [207177.050070]  ? aa_file_perm+0x124/0x4f0
> [207177.050071]  __mutex_lock_slowpath+0x13/0x20
> [207177.050072]  mutex_lock+0x25/0x30
> [207177.050075]  intel_vgpu_emulate_mmio_read+0x5d/0x3b0 [kvmgt]

This isn't a KVM problem, it's a KVMGT problem (despite the name, KVMGT is very
much not KVM).

> [207177.050084]  intel_vgpu_rw+0xb8/0x1c0 [kvmgt]
> [207177.050091]  intel_vgpu_read+0x20d/0x250 [kvmgt]
> [207177.050097]  vfio_device_fops_read+0x1f/0x40
> [207177.050100]  vfs_read+0x9b/0x160
> [207177.050102]  __x64_sys_pread64+0x93/0xd0
> [207177.050104]  do_syscall_64+0x58/0x80
> [207177.050106]  ? kvm_on_user_return+0x84/0xe0
> [207177.050107]  ? fire_user_return_notifiers+0x37/0x70
> [207177.050109]  ? exit_to_user_mode_prepare+0x41/0x200
> [207177.050111]  ? syscall_exit_to_user_mode+0x1b/0x40
> [207177.050112]  ? do_syscall_64+0x67/0x80
> [207177.050114]  ? irqentry_exit+0x54/0x70
> [207177.050115]  ? sysvec_call_function_single+0x4b/0xa0
> [207177.050116]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
> [207177.050118] RIP: 0033:0x7ff51131293f
> [207177.050119] RSP: 002b:00007ff4ddffa260 EFLAGS: 00000293 ORIG_RAX:
> 0000000000000011
> [207177.050121] RAX: ffffffffffffffda RBX: 00005599a6835420 RCX:
> 00007ff51131293f
> [207177.050122] RDX: 0000000000000004 RSI: 00007ff4ddffa2a8 RDI:
> 0000000000000027
> [207177.050123] RBP: 0000000000000004 R08: 0000000000000000 R09:
> 00000000ffffffff
> [207177.050124] R10: 0000000000065f10 R11: 0000000000000293 R12:
> 0000000000065f10
> [207177.050124] R13: 00005599a6835330 R14: 0000000000000004 R15:
> 0000000000065f10
> [207177.050126]  </TASK>
> 
>      I am seeing this on Intel i7-6700k, i7-6850k, and i7-9700k platforms.
> 
>      This did not happen on 5.17 kernels, and 5.18 kernels never ran stable
> enough on my platforms to actually run them for more than a few minutes.
> 
>      Likewise 6.0-rc1 has not been stable enough to run in production.  After
> less than three hours running on my workstation it locked hard with even the
> magic sys-request key being unresponsive and only power cycling the machine got
> it back.
> 
>      The operating system in use for the host on all machines is Ubuntu 22.04.
> 
>      Guests vary with Ubuntu 22.04 being the most common but also Mint, Debian,
> Manjaro, Centos, Fedora, ScientificLinux, Zorin, and Windows being in use.
> 
>      I see the same issue manifest on platforms running only Ubuntu guests as
> with guests of varying operating systems.  
> 
>      The configuration file I used to compile this kernel is attached.  I
> compiled it with gcc 12.1.0.
> 
>      This behavior does not manifest itself instantly, typically the machine
> needs to be running 3-7 days before it does.  Once it does guests keep stalling
> and restarting libvirtd does not help.  Only thing that seems to is a hard
> reboot of the physical host.  For this reason I believe the issue lies strictly
> with the host and not the guests.
> 
>      I have listed it as a severity of high since it is completely service
> interrupting.
> 
> -- 
> You may reply to this email to add a comment.
> 
> You are receiving this mail because:
> You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug 216388] On Host, kernel errors in KVM, on guests, it shows CPU stalls
  2022-08-21  7:37 [Bug 216388] New: On Host, kernel errors in KVM, on guests, it shows CPU stalls bugzilla-daemon
  2022-08-22 17:50 ` Sean Christopherson
@ 2022-08-22 17:50 ` bugzilla-daemon
  2022-08-22 23:46 ` bugzilla-daemon
                   ` (20 subsequent siblings)
  22 siblings, 0 replies; 27+ messages in thread
From: bugzilla-daemon @ 2022-08-22 17:50 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=216388

--- Comment #1 from Sean Christopherson (seanjc@google.com) ---
+GVT folks

On Sun, Aug 21, 2022, bugzilla-daemon@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=216388
> 
>             Bug ID: 216388
>            Summary: On Host, kernel errors in KVM, on guests, it shows CPU
>                     stalls
>            Product: Virtualization
>            Version: unspecified
>     Kernel Version: 5.19.0 / 5.19.1 / 5.19.2
>           Hardware: All
>                 OS: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: high
>           Priority: P1
>          Component: kvm
>           Assignee: virtualization_kvm@kernel-bugs.osdl.org
>           Reporter: nanook@eskimo.com
>         Regression: No
> 
> Created attachment 301614
>   --> https://bugzilla.kernel.org/attachment.cgi?id=301614&action=edit
> The configuration file used to Comile this kernel.
> 
> This behavior has persisted across 5.19.0, 5.19.1, and 5.19.2.  While the
> kernel I am taking this example from is tainted (owing to using Intel
> development drivers for GPU virtualization), it is also occurring on
> non-tainted kernels on servers with no development or third party modules
> installed.
> 
> INFO: task CPU 2/KVM:2343 blocked for more than 1228 seconds.
> [207177.050049]       Tainted: G     U    I       5.19.2 #1
> [207177.050050] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [207177.050051] task:CPU 2/KVM       state:D stack:    0 pid: 2343 ppid:    
> 1
> flags:0x00000002
> [207177.050054] Call Trace:
> [207177.050055]  <TASK>
> [207177.050056]  __schedule+0x359/0x1400
> [207177.050060]  ? kvm_mmu_page_fault+0x1ee/0x980
> [207177.050062]  ? kvm_set_msr_common+0x31f/0x1060
> [207177.050065]  schedule+0x5f/0x100
> [207177.050066]  schedule_preempt_disabled+0x15/0x30
> [207177.050068]  __mutex_lock.constprop.0+0x4e2/0x750
> [207177.050070]  ? aa_file_perm+0x124/0x4f0
> [207177.050071]  __mutex_lock_slowpath+0x13/0x20
> [207177.050072]  mutex_lock+0x25/0x30
> [207177.050075]  intel_vgpu_emulate_mmio_read+0x5d/0x3b0 [kvmgt]

This isn't a KVM problem, it's a KVMGT problem (despite the name, KVMGT is very
much not KVM).

> [207177.050084]  intel_vgpu_rw+0xb8/0x1c0 [kvmgt]
> [207177.050091]  intel_vgpu_read+0x20d/0x250 [kvmgt]
> [207177.050097]  vfio_device_fops_read+0x1f/0x40
> [207177.050100]  vfs_read+0x9b/0x160
> [207177.050102]  __x64_sys_pread64+0x93/0xd0
> [207177.050104]  do_syscall_64+0x58/0x80
> [207177.050106]  ? kvm_on_user_return+0x84/0xe0
> [207177.050107]  ? fire_user_return_notifiers+0x37/0x70
> [207177.050109]  ? exit_to_user_mode_prepare+0x41/0x200
> [207177.050111]  ? syscall_exit_to_user_mode+0x1b/0x40
> [207177.050112]  ? do_syscall_64+0x67/0x80
> [207177.050114]  ? irqentry_exit+0x54/0x70
> [207177.050115]  ? sysvec_call_function_single+0x4b/0xa0
> [207177.050116]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
> [207177.050118] RIP: 0033:0x7ff51131293f
> [207177.050119] RSP: 002b:00007ff4ddffa260 EFLAGS: 00000293 ORIG_RAX:
> 0000000000000011
> [207177.050121] RAX: ffffffffffffffda RBX: 00005599a6835420 RCX:
> 00007ff51131293f
> [207177.050122] RDX: 0000000000000004 RSI: 00007ff4ddffa2a8 RDI:
> 0000000000000027
> [207177.050123] RBP: 0000000000000004 R08: 0000000000000000 R09:
> 00000000ffffffff
> [207177.050124] R10: 0000000000065f10 R11: 0000000000000293 R12:
> 0000000000065f10
> [207177.050124] R13: 00005599a6835330 R14: 0000000000000004 R15:
> 0000000000065f10
> [207177.050126]  </TASK>
> 
>      I am seeing this on Intel i7-6700k, i7-6850k, and i7-9700k platforms.
> 
>      This did not happen on 5.17 kernels, and 5.18 kernels never ran stable
> enough on my platforms to actually run them for more than a few minutes.
> 
>      Likewise 6.0-rc1 has not been stable enough to run in production.  After
> less than three hours running on my workstation it locked hard with even the
> magic sys-request key being unresponsive and only power cycling the machine
> got
> it back.
> 
>      The operating system in use for the host on all machines is Ubuntu
>      22.04.
> 
>      Guests vary with Ubuntu 22.04 being the most common but also Mint,
>      Debian,
> Manjaro, Centos, Fedora, ScientificLinux, Zorin, and Windows being in use.
> 
>      I see the same issue manifest on platforms running only Ubuntu guests as
> with guests of varying operating systems.  
> 
>      The configuration file I used to compile this kernel is attached.  I
> compiled it with gcc 12.1.0.
> 
>      This behavior does not manifest itself instantly, typically the machine
> needs to be running 3-7 days before it does.  Once it does guests keep
> stalling
> and restarting libvirtd does not help.  Only thing that seems to is a hard
> reboot of the physical host.  For this reason I believe the issue lies
> strictly
> with the host and not the guests.
> 
>      I have listed it as a severity of high since it is completely service
> interrupting.
> 
> -- 
> You may reply to this email to add a comment.
> 
> You are receiving this mail because:
> You are watching the assignee of the bug.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Bug 216388] New: On Host, kernel errors in KVM, on guests, it shows CPU stalls
  2022-08-22 17:50 ` Sean Christopherson
@ 2022-08-22 23:21   ` Zhenyu Wang
  0 siblings, 0 replies; 27+ messages in thread
From: Zhenyu Wang @ 2022-08-22 23:21 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: bugzilla-daemon, kvm, Zhenyu Wang, Zhi Wang, intel-gvt-dev, intel-gfx

[-- Attachment #1: Type: text/plain, Size: 5682 bytes --]

On 2022.08.22 17:50:33 +0000, Sean Christopherson wrote:
> +GVT folks
>
> On Sun, Aug 21, 2022, bugzilla-daemon@kernel.org wrote:
> > https://bugzilla.kernel.org/show_bug.cgi?id=216388
> > 
> >             Bug ID: 216388
> >            Summary: On Host, kernel errors in KVM, on guests, it shows CPU
> >                     stalls
> >            Product: Virtualization
> >            Version: unspecified
> >     Kernel Version: 5.19.0 / 5.19.1 / 5.19.2
> >           Hardware: All
> >                 OS: Linux
> >               Tree: Mainline
> >             Status: NEW
> >           Severity: high
> >           Priority: P1
> >          Component: kvm
> >           Assignee: virtualization_kvm@kernel-bugs.osdl.org
> >           Reporter: nanook@eskimo.com
> >         Regression: No
> > 
> > Created attachment 301614
> >   --> https://bugzilla.kernel.org/attachment.cgi?id=301614&action=edit
> > The configuration file used to Comile this kernel.
> > 
> > This behavior has persisted across 5.19.0, 5.19.1, and 5.19.2.  While the
> > kernel I am taking this example from is tainted (owing to using Intel
> > development drivers for GPU virtualization), it is also occurring on
> > non-tainted kernels on servers with no development or third party modules
> > installed.
> > 
> > INFO: task CPU 2/KVM:2343 blocked for more than 1228 seconds.
> > [207177.050049]       Tainted: G     U    I       5.19.2 #1
> > [207177.050050] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> > this message.
> > [207177.050051] task:CPU 2/KVM       state:D stack:    0 pid: 2343 ppid:     1
> > flags:0x00000002
> > [207177.050054] Call Trace:
> > [207177.050055]  <TASK>
> > [207177.050056]  __schedule+0x359/0x1400
> > [207177.050060]  ? kvm_mmu_page_fault+0x1ee/0x980
> > [207177.050062]  ? kvm_set_msr_common+0x31f/0x1060
> > [207177.050065]  schedule+0x5f/0x100
> > [207177.050066]  schedule_preempt_disabled+0x15/0x30
> > [207177.050068]  __mutex_lock.constprop.0+0x4e2/0x750
> > [207177.050070]  ? aa_file_perm+0x124/0x4f0
> > [207177.050071]  __mutex_lock_slowpath+0x13/0x20
> > [207177.050072]  mutex_lock+0x25/0x30
> > [207177.050075]  intel_vgpu_emulate_mmio_read+0x5d/0x3b0 [kvmgt]
> 
> This isn't a KVM problem, it's a KVMGT problem (despite the name, KVMGT is very
> much not KVM).
> 
> > [207177.050084]  intel_vgpu_rw+0xb8/0x1c0 [kvmgt]
> > [207177.050091]  intel_vgpu_read+0x20d/0x250 [kvmgt]
> > [207177.050097]  vfio_device_fops_read+0x1f/0x40
> > [207177.050100]  vfs_read+0x9b/0x160
> > [207177.050102]  __x64_sys_pread64+0x93/0xd0
> > [207177.050104]  do_syscall_64+0x58/0x80
> > [207177.050106]  ? kvm_on_user_return+0x84/0xe0
> > [207177.050107]  ? fire_user_return_notifiers+0x37/0x70
> > [207177.050109]  ? exit_to_user_mode_prepare+0x41/0x200
> > [207177.050111]  ? syscall_exit_to_user_mode+0x1b/0x40
> > [207177.050112]  ? do_syscall_64+0x67/0x80
> > [207177.050114]  ? irqentry_exit+0x54/0x70
> > [207177.050115]  ? sysvec_call_function_single+0x4b/0xa0
> > [207177.050116]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > [207177.050118] RIP: 0033:0x7ff51131293f
> > [207177.050119] RSP: 002b:00007ff4ddffa260 EFLAGS: 00000293 ORIG_RAX:
> > 0000000000000011
> > [207177.050121] RAX: ffffffffffffffda RBX: 00005599a6835420 RCX:
> > 00007ff51131293f
> > [207177.050122] RDX: 0000000000000004 RSI: 00007ff4ddffa2a8 RDI:
> > 0000000000000027
> > [207177.050123] RBP: 0000000000000004 R08: 0000000000000000 R09:
> > 00000000ffffffff
> > [207177.050124] R10: 0000000000065f10 R11: 0000000000000293 R12:
> > 0000000000065f10
> > [207177.050124] R13: 00005599a6835330 R14: 0000000000000004 R15:
> > 0000000000065f10
> > [207177.050126]  </TASK>
> > 
> >      I am seeing this on Intel i7-6700k, i7-6850k, and i7-9700k platforms.

One recent regression fix on Comet Lake is https://patchwork.freedesktop.org/patch/496987/,
it's on the way to 6.0-rc and would be pushed to 5.19 stable as well. But looks this
report impacts on more platforms? We'll double check.

Thanks

> > 
> >      This did not happen on 5.17 kernels, and 5.18 kernels never ran stable
> > enough on my platforms to actually run them for more than a few minutes.
> > 
> >      Likewise 6.0-rc1 has not been stable enough to run in production.  After
> > less than three hours running on my workstation it locked hard with even the
> > magic sys-request key being unresponsive and only power cycling the machine got
> > it back.
> > 
> >      The operating system in use for the host on all machines is Ubuntu 22.04.
> > 
> >      Guests vary with Ubuntu 22.04 being the most common but also Mint, Debian,
> > Manjaro, Centos, Fedora, ScientificLinux, Zorin, and Windows being in use.
> > 
> >      I see the same issue manifest on platforms running only Ubuntu guests as
> > with guests of varying operating systems.  
> > 
> >      The configuration file I used to compile this kernel is attached.  I
> > compiled it with gcc 12.1.0.
> > 
> >      This behavior does not manifest itself instantly, typically the machine
> > needs to be running 3-7 days before it does.  Once it does guests keep stalling
> > and restarting libvirtd does not help.  Only thing that seems to is a hard
> > reboot of the physical host.  For this reason I believe the issue lies strictly
> > with the host and not the guests.
> > 
> >      I have listed it as a severity of high since it is completely service
> > interrupting.
> > 
> > -- 
> > You may reply to this email to add a comment.
> > 
> > You are receiving this mail because:
> > You are watching the assignee of the bug.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug 216388] On Host, kernel errors in KVM, on guests, it shows CPU stalls
  2022-08-21  7:37 [Bug 216388] New: On Host, kernel errors in KVM, on guests, it shows CPU stalls bugzilla-daemon
  2022-08-22 17:50 ` Sean Christopherson
  2022-08-22 17:50 ` [Bug 216388] " bugzilla-daemon
@ 2022-08-22 23:46 ` bugzilla-daemon
  2022-08-23  0:57 ` bugzilla-daemon
                   ` (19 subsequent siblings)
  22 siblings, 0 replies; 27+ messages in thread
From: bugzilla-daemon @ 2022-08-22 23:46 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=216388

--- Comment #2 from zhenyuw@linux.intel.com ---
On 2022.08.22 17:50:33 +0000, Sean Christopherson wrote:
> +GVT folks
>
> On Sun, Aug 21, 2022, bugzilla-daemon@kernel.org wrote:
> > https://bugzilla.kernel.org/show_bug.cgi?id=216388
> > 
> >             Bug ID: 216388
> >            Summary: On Host, kernel errors in KVM, on guests, it shows CPU
> >                     stalls
> >            Product: Virtualization
> >            Version: unspecified
> >     Kernel Version: 5.19.0 / 5.19.1 / 5.19.2
> >           Hardware: All
> >                 OS: Linux
> >               Tree: Mainline
> >             Status: NEW
> >           Severity: high
> >           Priority: P1
> >          Component: kvm
> >           Assignee: virtualization_kvm@kernel-bugs.osdl.org
> >           Reporter: nanook@eskimo.com
> >         Regression: No
> > 
> > Created attachment 301614
> >   --> https://bugzilla.kernel.org/attachment.cgi?id=301614&action=edit
> > The configuration file used to Comile this kernel.
> > 
> > This behavior has persisted across 5.19.0, 5.19.1, and 5.19.2.  While the
> > kernel I am taking this example from is tainted (owing to using Intel
> > development drivers for GPU virtualization), it is also occurring on
> > non-tainted kernels on servers with no development or third party modules
> > installed.
> > 
> > INFO: task CPU 2/KVM:2343 blocked for more than 1228 seconds.
> > [207177.050049]       Tainted: G     U    I       5.19.2 #1
> > [207177.050050] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> > this message.
> > [207177.050051] task:CPU 2/KVM       state:D stack:    0 pid: 2343 ppid:   
>  1
> > flags:0x00000002
> > [207177.050054] Call Trace:
> > [207177.050055]  <TASK>
> > [207177.050056]  __schedule+0x359/0x1400
> > [207177.050060]  ? kvm_mmu_page_fault+0x1ee/0x980
> > [207177.050062]  ? kvm_set_msr_common+0x31f/0x1060
> > [207177.050065]  schedule+0x5f/0x100
> > [207177.050066]  schedule_preempt_disabled+0x15/0x30
> > [207177.050068]  __mutex_lock.constprop.0+0x4e2/0x750
> > [207177.050070]  ? aa_file_perm+0x124/0x4f0
> > [207177.050071]  __mutex_lock_slowpath+0x13/0x20
> > [207177.050072]  mutex_lock+0x25/0x30
> > [207177.050075]  intel_vgpu_emulate_mmio_read+0x5d/0x3b0 [kvmgt]
> 
> This isn't a KVM problem, it's a KVMGT problem (despite the name, KVMGT is
> very
> much not KVM).
> 
> > [207177.050084]  intel_vgpu_rw+0xb8/0x1c0 [kvmgt]
> > [207177.050091]  intel_vgpu_read+0x20d/0x250 [kvmgt]
> > [207177.050097]  vfio_device_fops_read+0x1f/0x40
> > [207177.050100]  vfs_read+0x9b/0x160
> > [207177.050102]  __x64_sys_pread64+0x93/0xd0
> > [207177.050104]  do_syscall_64+0x58/0x80
> > [207177.050106]  ? kvm_on_user_return+0x84/0xe0
> > [207177.050107]  ? fire_user_return_notifiers+0x37/0x70
> > [207177.050109]  ? exit_to_user_mode_prepare+0x41/0x200
> > [207177.050111]  ? syscall_exit_to_user_mode+0x1b/0x40
> > [207177.050112]  ? do_syscall_64+0x67/0x80
> > [207177.050114]  ? irqentry_exit+0x54/0x70
> > [207177.050115]  ? sysvec_call_function_single+0x4b/0xa0
> > [207177.050116]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > [207177.050118] RIP: 0033:0x7ff51131293f
> > [207177.050119] RSP: 002b:00007ff4ddffa260 EFLAGS: 00000293 ORIG_RAX:
> > 0000000000000011
> > [207177.050121] RAX: ffffffffffffffda RBX: 00005599a6835420 RCX:
> > 00007ff51131293f
> > [207177.050122] RDX: 0000000000000004 RSI: 00007ff4ddffa2a8 RDI:
> > 0000000000000027
> > [207177.050123] RBP: 0000000000000004 R08: 0000000000000000 R09:
> > 00000000ffffffff
> > [207177.050124] R10: 0000000000065f10 R11: 0000000000000293 R12:
> > 0000000000065f10
> > [207177.050124] R13: 00005599a6835330 R14: 0000000000000004 R15:
> > 0000000000065f10
> > [207177.050126]  </TASK>
> > 
> >      I am seeing this on Intel i7-6700k, i7-6850k, and i7-9700k platforms.

One recent regression fix on Comet Lake is
https://patchwork.freedesktop.org/patch/496987/,
it's on the way to 6.0-rc and would be pushed to 5.19 stable as well. But looks
this
report impacts on more platforms? We'll double check.

Thanks

> > 
> >      This did not happen on 5.17 kernels, and 5.18 kernels never ran stable
> > enough on my platforms to actually run them for more than a few minutes.
> > 
> >      Likewise 6.0-rc1 has not been stable enough to run in production. 
> After
> > less than three hours running on my workstation it locked hard with even
> the
> > magic sys-request key being unresponsive and only power cycling the machine
> got
> > it back.
> > 
> >      The operating system in use for the host on all machines is Ubuntu
> 22.04.
> > 
> >      Guests vary with Ubuntu 22.04 being the most common but also Mint,
> Debian,
> > Manjaro, Centos, Fedora, ScientificLinux, Zorin, and Windows being in use.
> > 
> >      I see the same issue manifest on platforms running only Ubuntu guests
> as
> > with guests of varying operating systems.  
> > 
> >      The configuration file I used to compile this kernel is attached.  I
> > compiled it with gcc 12.1.0.
> > 
> >      This behavior does not manifest itself instantly, typically the
> machine
> > needs to be running 3-7 days before it does.  Once it does guests keep
> stalling
> > and restarting libvirtd does not help.  Only thing that seems to is a hard
> > reboot of the physical host.  For this reason I believe the issue lies
> strictly
> > with the host and not the guests.
> > 
> >      I have listed it as a severity of high since it is completely service
> > interrupting.
> > 
> > -- 
> > You may reply to this email to add a comment.
> > 
> > You are receiving this mail because:
> > You are watching the assignee of the bug.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug 216388] On Host, kernel errors in KVM, on guests, it shows CPU stalls
  2022-08-21  7:37 [Bug 216388] New: On Host, kernel errors in KVM, on guests, it shows CPU stalls bugzilla-daemon
                   ` (2 preceding siblings ...)
  2022-08-22 23:46 ` bugzilla-daemon
@ 2022-08-23  0:57 ` bugzilla-daemon
  2022-08-27 19:42 ` bugzilla-daemon
                   ` (18 subsequent siblings)
  22 siblings, 0 replies; 27+ messages in thread
From: bugzilla-daemon @ 2022-08-23  0:57 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=216388

--- Comment #3 from Robert Dinse (nanook@eskimo.com) ---
     Regarding this being a KVMGT and NOT a KVM problem, while this report does
come from a machine where I have Intel GPU virtualization in use, it has also
occurred on three machines i7-6700k and i7-6850k machines with no GPU
virtualization although it is configured into the kernel simply because I used
the same config file for all of the machines.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug 216388] On Host, kernel errors in KVM, on guests, it shows CPU stalls
  2022-08-21  7:37 [Bug 216388] New: On Host, kernel errors in KVM, on guests, it shows CPU stalls bugzilla-daemon
                   ` (3 preceding siblings ...)
  2022-08-23  0:57 ` bugzilla-daemon
@ 2022-08-27 19:42 ` bugzilla-daemon
  2022-08-28 21:08 ` bugzilla-daemon
                   ` (17 subsequent siblings)
  22 siblings, 0 replies; 27+ messages in thread
From: bugzilla-daemon @ 2022-08-27 19:42 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=216388

--- Comment #4 from Robert Dinse (nanook@eskimo.com) ---
     I am not seeing this particular CPU stall on 5.19.4, but I am seeing other
CPU stalls.  I've opened three different tickets on CPU stalls because they've
all been in completely different tasks but at this point I have to wonder if
there isn't some common code that they are all calling or a broken structure
they are all using or something similar.  Rather than open 40 more tickets that
all end up being a duplicate, perhaps someone familiar with the internal
workings could look at these two tickets in addition to this one, #216399,
which is a stall on an MDRAID task, and #216405, and then before I open yet
another ticket, here is yet another CPU stall in a task worker:

[  489.383957] INFO: task worker:11403 blocked for more than 122 seconds.
[  489.383962]       Not tainted 5.19.4 #1
[  489.383964] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[  489.383965] task:worker          state:D stack:    0 pid:11403 ppid:     1
flags:0x00004002
[  489.383968] Call Trace:
[  489.383970]  <TASK>
[  489.383973]  __schedule+0x367/0x1400
[  489.383980]  schedule+0x58/0xf0
[  489.383983]  io_schedule+0x46/0x80
[  489.383985]  folio_wait_bit_common+0x11e/0x350
[  489.383989]  ? filemap_invalidate_unlock_two+0x50/0x50
[  489.383992]  folio_wait_bit+0x18/0x20
[  489.383994]  folio_wait_writeback+0x2c/0x80
[  489.383997]  wait_on_page_writeback+0x18/0x50
[  489.383999]  __filemap_fdatawait_range+0x98/0x140
[  489.384003]  file_write_and_wait_range+0x83/0xb0
[  489.384005]  ext4_sync_file+0xf3/0x320
[  489.384009]  __x64_sys_fdatasync+0x4e/0xa0
[  489.384012]  ? syscall_enter_from_user_mode+0x50/0x70
[  489.384014]  do_syscall_64+0x58/0x80
[  489.384017]  ? sysvec_apic_timer_interrupt+0x4b/0xa0
[  489.384020]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
[  489.384022] RIP: 0033:0x7f96e331bb1b
[  489.384025] RSP: 002b:00007f96788c75d0 EFLAGS: 00000293 ORIG_RAX:
000000000000004b
[  489.384027] RAX: ffffffffffffffda RBX: 00005639414e0860 RCX:
00007f96e331bb1b
[  489.384029] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
000000000000000b
[  489.384030] RBP: 0000563941270890 R08: 0000000000000000 R09:
0000000000000000
[  489.384031] R10: 00007f96788c75f0 R11: 0000000000000293 R12:
0000000000000000
[  489.384033] R13: 00005639412708f8 R14: 00005639425cedd0 R15:
00007ffded76f3d0
[  489.384036]  </TASK>

If this appears to be related I will not generate a ticket but I am not
knowledgable enough about the internals to know.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug 216388] On Host, kernel errors in KVM, on guests, it shows CPU stalls
  2022-08-21  7:37 [Bug 216388] New: On Host, kernel errors in KVM, on guests, it shows CPU stalls bugzilla-daemon
                   ` (4 preceding siblings ...)
  2022-08-27 19:42 ` bugzilla-daemon
@ 2022-08-28 21:08 ` bugzilla-daemon
  2022-09-01  6:09 ` bugzilla-daemon
                   ` (16 subsequent siblings)
  22 siblings, 0 replies; 27+ messages in thread
From: bugzilla-daemon @ 2022-08-28 21:08 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=216388

--- Comment #5 from Robert Dinse (nanook@eskimo.com) ---
Here is another example:

[98519.357381] Task dump for CPU 10:
[98519.357382] task:Embedded solr q state:R  running task     stack:    0
pid:607931 ppid:     1 flags:0x00000000
[98519.357389] Call Trace:
[98519.357393]  <TASK>
[98519.357399]  ? kvm_clock_get_cycles+0x11/0x20
[98519.357408]  ? ktime_get+0x46/0xc0
[98519.357411]  ? lapic_next_deadline+0x2c/0x40
[98519.357414]  ? clockevents_program_event+0xae/0x130
[98519.357418]  ? tick_program_event+0x43/0x90
[98519.357420]  ? hrtimer_interrupt+0x11f/0x220
[98519.357423]  ? exit_to_user_mode_prepare+0x41/0x1e0
[98519.357427]  ? irqentry_exit_to_user_mode+0x9/0x30
[98519.357430]  ? irqentry_exit+0x1d/0x30
[98519.357432]  ? sysvec_apic_timer_interrupt+0x4b/0xa0
[98519.357436]  ? asm_sysvec_apic_timer_interrupt+0x1b/0x20
[98519.357442]  </TASK>

As you can see these are happening all over hell and back.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug 216388] On Host, kernel errors in KVM, on guests, it shows CPU stalls
  2022-08-21  7:37 [Bug 216388] New: On Host, kernel errors in KVM, on guests, it shows CPU stalls bugzilla-daemon
                   ` (5 preceding siblings ...)
  2022-08-28 21:08 ` bugzilla-daemon
@ 2022-09-01  6:09 ` bugzilla-daemon
  2022-09-01 16:44   ` Sean Christopherson
  2022-09-01 16:44 ` bugzilla-daemon
                   ` (15 subsequent siblings)
  22 siblings, 1 reply; 27+ messages in thread
From: bugzilla-daemon @ 2022-09-01  6:09 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=216388

--- Comment #6 from Robert Dinse (nanook@eskimo.com) ---
Installed 5.19.6 on a couple of machines today, still getting CPU stalls but in
random locations:

[    6.601788] rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: {
4-... } 3 jiffies s: 53 root: 0x10/.                                            
[    6.601802] rcu: blocking rcu_node structures (internal RCU debug):          
[    6.601806] Task dump for CPU 4:                                             
[    6.601808] task:systemd-udevd   state:R  running task     stack:    0 pid: 
468 ppid:   454 flags:0x0000400a                                                
[    6.604313] Call Trace:                                                      
[    6.604324]  <TASK>                                                          
[    6.604326]  ? cpumask_any_but+0x35/0x50                                     
[    6.604336]  ? x2apic_send_IPI_allbutself+0x2f/0x40                          
[    6.604339]  ? do_sync_core+0x2a/0x30                                        
[    6.604342]  ? cpumask_next+0x23/0x30                                        
[    6.604344]  ? smp_call_function_many_cond+0xea/0x370                        
[    6.604347]  ? text_poke_memset+0x20/0x20                                    
[    6.604350]  ? arch_unregister_cpu+0x50/0x50                                 
[    6.604352]  ? on_each_cpu_cond_mask+0x1d/0x30                               
[    6.604354]  ? text_poke_bp_batch+0x1fb/0x210                                
[    6.604358]  ? enter_smm.constprop.0+0x51a/0xa70 [kvm]                       
[    6.604414]  ? vmx_set_cr0+0x16f0/0x16f0 [kvm_intel]                         
[    6.604457]  ? enter_smm.constprop.0+0x519/0xa70 [kvm]                       
[    6.604501]  ? text_poke_bp+0x49/0x70                                        
[    6.604504]  ? __static_call_transform+0x7f/0x120                            
[    6.604506]  ? arch_static_call_transform+0x87/0xa0                          
[    6.604508]  ? enter_smm.constprop.0+0x519/0xa70 [kvm]                       
[    6.604552]  ? __static_call_update+0x16e/0x220                              
[    6.604554]  ? vmx_set_cr0+0x16f0/0x16f0 [kvm_intel]                         
[    6.604567]  ? kvm_arch_hardware_setup+0x35a/0x17f0 [kvm]                    
[    6.604611]  ? __kmalloc_node+0x16c/0x380                                    
[    6.604615]  ? kvm_init+0xa2/0x400 [kvm]                                     
[    6.604654]  ? hardware_setup+0x7e2/0x8cc [kvm_intel]                        
[    6.604666]  ? vmx_init+0xf9/0x201 [kvm_intel]                               
[    6.604676]  ? hardware_setup+0x8cc/0x8cc [kvm_intel]                        
[    6.604685]  ? do_one_initcall+0x47/0x1e0                                    
[    6.604689]  ? kmem_cache_alloc_trace+0x16c/0x2b0                            
[    6.604692]  ? do_init_module+0x50/0x1f0                                     
[    6.604694]  ? load_module+0x21bd/0x25e0                                     
[    6.604696]  ? ima_post_read_file+0xd5/0x100                                 
[    6.604700]  ? kernel_read_file+0x23d/0x2e0                                  
[    6.604703]  ? __do_sys_finit_module+0xbd/0x130                              
[    6.604705]  ? __do_sys_finit_module+0xbd/0x130                              
[    6.604708]  ? __x64_sys_finit_module+0x18/0x20                              
[    6.604710]  ? do_syscall_64+0x58/0x80                                       
[    6.604713]  ? syscall_exit_to_user_mode+0x1b/0x40                           
[    6.604715]  ? do_syscall_64+0x67/0x80                                       
[    6.604718]  ? switch_fpu_return+0x4e/0xc0                                   
[    6.604720]  ? exit_to_user_mode_prepare+0x184/0x1e0                         
[    6.604723]  ? syscall_exit_to_user_mode+0x1b/0x40                           
[    6.604725]  ? do_syscall_64+0x67/0x80                                       
[    6.604728]  ? do_syscall_64+0x67/0x80                                       
[    6.604730]  ? do_syscall_64+0x67/0x80                                       
[    6.604732]  ? sysvec_call_function+0x4b/0xa0                                
[    6.604735]  ? entry_SYSCALL_64_after_hwframe+0x63/0xcd                      
[    6.604739]  </TASK>     
[    6.697044] rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: {
4-... } 13 jiffies s: 53 root: 0x10/.                                           
[    6.697051] rcu: blocking rcu_node structures (internal RCU debug):          
[    6.697052] Task dump for CPU 4:                                             
[    6.697053] task:systemd-udevd   state:R  running task     stack:    0 pid: 
468 ppid:   454 flags:0x0000400a                                                
[    6.697057] Call Trace:                                                      
[    6.697058]  <TASK>                                                          
[    6.697059]  ? cpumask_any_but+0x35/0x50                                     
[    6.697065]  ? x2apic_send_IPI_allbutself+0x2f/0x40                          
[    6.697068]  ? do_sync_core+0x2a/0x30                                        
[    6.697071]  ? cpumask_next+0x23/0x30                                        
[    6.697072]  ? smp_call_function_many_cond+0xea/0x370                        
[    6.697075]  ? text_poke_memset+0x20/0x20                                    
[    6.697077]  ? arch_unregister_cpu+0x50/0x50                                 
[    6.697080]  ? on_each_cpu_cond_mask+0x1d/0x30                               
[    6.697081]  ? text_poke_bp_batch+0x1fb/0x210                                
[    6.697084]  ? kvm_set_msr_common+0x939/0x1060 [kvm]                         
[    6.697133]  ? vmx_set_efer.part.0+0x160/0x160 [kvm_intel]                   
[    6.697147]  ? kvm_set_msr_common+0x938/0x1060 [kvm]                         
[    6.697187]  ? text_poke_bp+0x49/0x70                                        
[    6.697189]  ? __static_call_transform+0x7f/0x120                            
[    6.697191]  ? arch_static_call_transform+0x87/0xa0                          
[    6.697193]  ? kvm_set_msr_common+0x938/0x1060 [kvm]                         
[    6.697234]  ? __static_call_update+0x16e/0x220                              
[    6.697236]  ? vmx_set_efer.part.0+0x160/0x160 [kvm_intel]                   
[    6.697246]  ? kvm_arch_hardware_setup+0x423/0x17f0 [kvm]                    
[    6.697286]  ? __kmalloc_node+0x16c/0x380                                    
[    6.697290]  ? kvm_init+0xa2/0x400 [kvm]                                     
[    6.697326]  ? hardware_setup+0x7e2/0x8cc [kvm_intel]                        
[    6.697336]  ? vmx_init+0xf9/0x201 [kvm_intel]                               
[    6.697345]  ? hardware_setup+0x8cc/0x8cc [kvm_intel]                        
[    6.697353]  ? do_one_initcall+0x47/0x1e0                                    
[    6.697356]  ? kmem_cache_alloc_trace+0x16c/0x2b0                            
[    6.697359]  ? do_init_module+0x50/0x1f0                                     
[    6.697360]  ? load_module+0x21bd/0x25e0                                     
[    6.697362]  ? ima_post_read_file+0xd5/0x100                                 
[    6.697365]  ? kernel_read_file+0x23d/0x2e0                                  
[    6.697368]  ? __do_sys_finit_module+0xbd/0x130                              
[    6.697370]  ? __do_sys_finit_module+0xbd/0x130                              
[    6.697372]  ? __x64_sys_finit_module+0x18/0x20                              
[    6.697373]  ? do_syscall_64+0x58/0x80                                       
[    6.697376]  ? syscall_exit_to_user_mode+0x1b/0x40
[    6.697377]  ? do_syscall_64+0x67/0x80
[    6.697379]  ? switch_fpu_return+0x4e/0xc0
[    6.697382]  ? exit_to_user_mode_prepare+0x184/0x1e0
[    6.697384]  ? syscall_exit_to_user_mode+0x1b/0x40
[    6.697386]  ? do_syscall_64+0x67/0x80
[    6.697387]  ? do_syscall_64+0x67/0x80
[    6.697389]  ? do_syscall_64+0x67/0x80
[    6.697391]  ? sysvec_call_function+0x4b/0xa0
[    6.697393]  ? entry_SYSCALL_64_after_hwframe+0x63/0xcd
[    6.697397]  </TASK>

[    6.798781] rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: {
4-... } 23 jiffies s: 53 root: 0x10/.
[    6.798787] rcu: blocking rcu_node structures (internal RCU debug):
[    6.798833] Task dump for CPU 4:
[    6.798952] task:systemd-udevd   state:R  running task     stack:    0 pid: 
468 ppid:   454 flags:0x0000400a
[    6.798957] Call Trace:
[    6.798959]  <TASK>
[    6.798960]  ? cpumask_any_but+0x35/0x50
[    6.798967]  ? x2apic_send_IPI_allbutself+0x2f/0x40
[    6.798969]  ? do_sync_core+0x2a/0x30
[    6.800010]  ? cpumask_next+0x23/0x30
[    6.800014]  ? smp_call_function_many_cond+0xea/0x370
[    6.800017]  ? text_poke_memset+0x20/0x20
[    6.800019]  ? arch_unregister_cpu+0x50/0x50
[    6.800024]  ? __SCT__kvm_x86_set_rflags+0x8/0x8 [kvm]
[    6.800096]  ? vmx_get_rflags+0x130/0x130 [kvm_intel]
[    6.800109]  ? on_each_cpu_cond_mask+0x1d/0x30
[    6.800110]  ? text_poke_bp_batch+0xaf/0x210
[    6.800113]  ? vmx_get_rflags+0x130/0x130 [kvm_intel]
[    6.800121]  ? __SCT__kvm_x86_set_rflags+0x8/0x8 [kvm]
[    6.800172]  ? vmx_get_rflags+0x130/0x130 [kvm_intel]
[    6.800180]  ? text_poke_bp+0x49/0x70
[    6.800182]  ? __static_call_transform+0x7f/0x120
[    6.800183]  ? arch_static_call_transform+0x58/0xa0
[    6.800185]  ? __SCT__kvm_x86_set_rflags+0x8/0x8 [kvm]
[    6.800233]  ? __static_call_update+0x62/0x220
[    6.800235]  ? vmx_get_rflags+0x130/0x130 [kvm_intel]
[    6.800243]  ? kvm_arch_hardware_setup+0x581/0x17f0 [kvm]
[    6.800284]  ? __kmalloc_node+0x16c/0x380
[    6.800288]  ? kvm_init+0xa2/0x400 [kvm]
[    6.800324]  ? hardware_setup+0x7e2/0x8cc [kvm_intel]
[    6.800334]  ? vmx_init+0xf9/0x201 [kvm_intel]
[    6.800342]  ? hardware_setup+0x8cc/0x8cc [kvm_intel]
[    6.800350]  ? do_one_initcall+0x47/0x1e0
[    6.800352]  ? kmem_cache_alloc_trace+0x16c/0x2b0
[    6.800355]  ? do_init_module+0x50/0x1f0
[    6.800357]  ? load_module+0x21bd/0x25e0
[    6.800358]  ? ima_post_read_file+0xd5/0x100
[    6.800361]  ? kernel_read_file+0x23d/0x2e0
[    6.800364]  ? __do_sys_finit_module+0xbd/0x130
[    6.800365]  ? __do_sys_finit_module+0xbd/0x130
[    6.800368]  ? __x64_sys_finit_module+0x18/0x20
[    6.800369]  ? do_syscall_64+0x58/0x80
[    6.800371]  ? syscall_exit_to_user_mode+0x1b/0x40
[    6.800373]  ? do_syscall_64+0x67/0x80
[    6.800375]  ? switch_fpu_return+0x4e/0xc0
[    6.800377]  ? exit_to_user_mode_prepare+0x184/0x1e0
[    6.800379]  ? syscall_exit_to_user_mode+0x1b/0x40
[    6.800380]  ? do_syscall_64+0x67/0x80
[    6.800382]  ? do_syscall_64+0x67/0x80
[    6.800384]  ? do_syscall_64+0x67/0x80
[    6.800385]  ? sysvec_call_function+0x4b/0xa0
[    6.800387]  ? entry_SYSCALL_64_after_hwframe+0x63/0xcd
[    6.800391]  </TASK>

     Are these related or should I open a new ticket?  These occurred right
after boot.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Bug 216388] On Host, kernel errors in KVM, on guests, it shows CPU stalls
  2022-09-01  6:09 ` bugzilla-daemon
@ 2022-09-01 16:44   ` Sean Christopherson
  0 siblings, 0 replies; 27+ messages in thread
From: Sean Christopherson @ 2022-09-01 16:44 UTC (permalink / raw)
  To: bugzilla-daemon; +Cc: kvm

On Thu, Sep 01, 2022, bugzilla-daemon@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=216388
> 
> --- Comment #6 from Robert Dinse (nanook@eskimo.com) ---
> Installed 5.19.6 on a couple of machines today, still getting CPU stalls but in
> random locations:

...

>      Are these related or should I open a new ticket?  These occurred right
> after boot.

Odds are very good that all of the stalls are due to one bug.  Stall warnings fire
when a task or CPU waiting on an RCU grace period hasn't made forward progress in
a certain amount of time.  In both cases, many times the CPU yelling that it's
stalled is a victim and not the culprit, i.e. a stalled task/CPU often indicates
that something is broken elsewhere in the system that is preventing forward progress
on _this_ task/CPU.

Normally I would suggest bisecting, but given that v5.18 is broken for you that
probably isn't an option.

In the logs, are there any common patterns (beyond running KVM)?  E.g. any functions
that show up in stack traces in all instances?  If nothing obvious jumps out, it
might be worth uploading a pile of (compressed) traces somewhere so that others can
poke through them; maybe someone will find the needle.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug 216388] On Host, kernel errors in KVM, on guests, it shows CPU stalls
  2022-08-21  7:37 [Bug 216388] New: On Host, kernel errors in KVM, on guests, it shows CPU stalls bugzilla-daemon
                   ` (6 preceding siblings ...)
  2022-09-01  6:09 ` bugzilla-daemon
@ 2022-09-01 16:44 ` bugzilla-daemon
  2022-09-01 19:46 ` bugzilla-daemon
                   ` (14 subsequent siblings)
  22 siblings, 0 replies; 27+ messages in thread
From: bugzilla-daemon @ 2022-09-01 16:44 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=216388

--- Comment #7 from Sean Christopherson (seanjc@google.com) ---
On Thu, Sep 01, 2022, bugzilla-daemon@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=216388
> 
> --- Comment #6 from Robert Dinse (nanook@eskimo.com) ---
> Installed 5.19.6 on a couple of machines today, still getting CPU stalls but
> in
> random locations:

...

>      Are these related or should I open a new ticket?  These occurred right
> after boot.

Odds are very good that all of the stalls are due to one bug.  Stall warnings
fire
when a task or CPU waiting on an RCU grace period hasn't made forward progress
in
a certain amount of time.  In both cases, many times the CPU yelling that it's
stalled is a victim and not the culprit, i.e. a stalled task/CPU often
indicates
that something is broken elsewhere in the system that is preventing forward
progress
on _this_ task/CPU.

Normally I would suggest bisecting, but given that v5.18 is broken for you that
probably isn't an option.

In the logs, are there any common patterns (beyond running KVM)?  E.g. any
functions
that show up in stack traces in all instances?  If nothing obvious jumps out,
it
might be worth uploading a pile of (compressed) traces somewhere so that others
can
poke through them; maybe someone will find the needle.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug 216388] On Host, kernel errors in KVM, on guests, it shows CPU stalls
  2022-08-21  7:37 [Bug 216388] New: On Host, kernel errors in KVM, on guests, it shows CPU stalls bugzilla-daemon
                   ` (7 preceding siblings ...)
  2022-09-01 16:44 ` bugzilla-daemon
@ 2022-09-01 19:46 ` bugzilla-daemon
  2022-09-01 21:37 ` bugzilla-daemon
                   ` (13 subsequent siblings)
  22 siblings, 0 replies; 27+ messages in thread
From: bugzilla-daemon @ 2022-09-01 19:46 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=216388

--- Comment #8 from Robert Dinse (nanook@eskimo.com) ---
     I will scour the logs and see what I can find.  My understanding is that
Ubuntu 22.10 is going to be 5.19 based, but Ubuntu does not run a tickless
kernel so they would not see this if related to that.  I may try compiling the
host machines non-tickless just to see if that makes a difference.

     I could run 5.18 on my workstation but it is not busy enough to frequently
see these (maybe once a week).  Oh these three above were all on a KVM guest
rather than a host machine.  They were from my web server which is quite a busy
machine.  I tried 6.0.0-rc3 on my workstation and it is still wonky.  No longer
locking up without error or even magic sys request working but now video will
not play from bitchute, and video will play from odysee but without audio while
youtube gets both audio and video, all from the same browser (Firefox) so I
don't know where to even start with that.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug 216388] On Host, kernel errors in KVM, on guests, it shows CPU stalls
  2022-08-21  7:37 [Bug 216388] New: On Host, kernel errors in KVM, on guests, it shows CPU stalls bugzilla-daemon
                   ` (8 preceding siblings ...)
  2022-09-01 19:46 ` bugzilla-daemon
@ 2022-09-01 21:37 ` bugzilla-daemon
  2022-09-02  5:46 ` bugzilla-daemon
                   ` (12 subsequent siblings)
  22 siblings, 0 replies; 27+ messages in thread
From: bugzilla-daemon @ 2022-09-01 21:37 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=216388

--- Comment #9 from Robert Dinse (nanook@eskimo.com) ---
I spent some time digging through web server logs, these are the logs from the
machine that produced the last three CPU stall messages, but this time the
stall occurred on apache2, this is cool because that is something I can readily
test.  Here is the stall message:

Sep  1 14:26:53 ftp kernel: [   18.819394][  T298] rcu: INFO: rcu_sched
detected expedited stalls on CPUs/tasks: { 3-... } 4 jiffies s: 441 root: 0x8/.
Sep  1 14:26:53 ftp kernel: [   18.819413][  T298] rcu: blocking rcu_node
structures (internal RCU debug):
Sep  1 14:26:53 ftp kernel: [   18.819417][  T298] Task dump for CPU 3:
Sep  1 14:26:53 ftp kernel: [   18.819418][  T298] task:httpd           state:R
 running task     stack:    0 pid: 2798 ppid:  2460 flags:0x0000400a
Sep  1 14:26:53 ftp kernel: [   18.819424][  T298] Call Trace:
Sep  1 14:26:53 ftp kernel: [   18.819428][  T298]  <TASK>
Sep  1 14:26:53 ftp kernel: [   18.819437][  T298]  ? alloc_pages+0x90/0x1a0
Sep  1 14:26:53 ftp kernel: [   18.819443][  T298]  ? allocate_slab+0x274/0x460
Sep  1 14:26:53 ftp kernel: [   18.819445][  T298]  ? xa_load+0xa6/0xc0
Sep  1 14:26:53 ftp kernel: [   18.819448][  T298]  ?
___slab_alloc.constprop.0+0x50b/0x5f0
Sep  1 14:26:53 ftp kernel: [   18.819451][  T298]  ?
kmem_cache_alloc_lru+0x297/0x360
Sep  1 14:26:53 ftp kernel: [   18.819456][  T298]  ? nfs_find_actor+0x90/0x90
[nfs]
Sep  1 14:26:53 ftp kernel: [   18.819504][  T298]  ? nfs_alloc_inode+0x21/0x60
[nfs]
Sep  1 14:26:53 ftp kernel: [   18.819519][  T298]  ? alloc_inode+0x23/0xc0
Sep  1 14:26:53 ftp kernel: [   18.819526][  T298]  ?
nfs_alloc_fhandle+0x30/0x30 [nfs]
Sep  1 14:26:53 ftp kernel: [   18.819541][  T298]  ? iget5_locked+0x53/0xa0
Sep  1 14:26:53 ftp kernel: [   18.819543][  T298]  ? list_lru_add+0x13f/0x190
Sep  1 14:26:53 ftp kernel: [   18.819547][  T298]  ? nfs_fhget+0xd2/0x6d0
[nfs]
Sep  1 14:26:53 ftp kernel: [   18.819570][  T298]  ?
nfs_readdir_entry_decode+0x31e/0x440 [nfs]
Sep  1 14:26:53 ftp kernel: [   18.819581][  T298]  ?
nfs_readdir_page_filler+0x10d/0x4f0 [nfs]
Sep  1 14:26:53 ftp kernel: [   18.819592][  T298]  ?
nfs_readdir_xdr_to_array+0x45e/0x4a0 [nfs]
Sep  1 14:26:53 ftp kernel: [   18.819602][  T298]  ? nfs_readdir+0x2e6/0xea0
[nfs]
Sep  1 14:26:53 ftp kernel: [   18.819613][  T298]  ? iterate_dir+0x9b/0x1d0
Sep  1 14:26:53 ftp kernel: [   18.819615][  T298]  ?
__x64_sys_getdents64+0x84/0x120
Sep  1 14:26:53 ftp kernel: [   18.819616][  T298]  ?
__ia32_sys_getdents64+0x120/0x120
Sep  1 14:26:53 ftp kernel: [   18.819618][  T298]  ? do_syscall_64+0x5b/0x80
Sep  1 14:26:53 ftp kernel: [   18.819620][  T298]  ?
do_user_addr_fault+0x1c1/0x620
Sep  1 14:26:53 ftp kernel: [   18.819622][  T298]  ?
exit_to_user_mode_prepare+0x41/0x1e0
Sep  1 14:26:53 ftp kernel: [   18.819625][  T298]  ?
irqentry_exit_to_user_mode+0x9/0x30
Sep  1 14:26:53 ftp kernel: [   18.819626][  T298]  ? irqentry_exit+0x1d/0x30
Sep  1 14:26:53 ftp kernel: [   18.819627][  T298]  ? exc_page_fault+0x86/0x160
Sep  1 14:26:53 ftp kernel: [   18.819628][  T298]  ?
entry_SYSCALL_64_after_hwframe+0x63/0xcd
Sep  1 14:26:53 ftp kernel: [   18.819631][  T298]  </TASK>

     Now that process is gone, but the parent process is still running and
Apache still seems to be responding fine.  Checking the error log, there were
no errors with that PID.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug 216388] On Host, kernel errors in KVM, on guests, it shows CPU stalls
  2022-08-21  7:37 [Bug 216388] New: On Host, kernel errors in KVM, on guests, it shows CPU stalls bugzilla-daemon
                   ` (9 preceding siblings ...)
  2022-09-01 21:37 ` bugzilla-daemon
@ 2022-09-02  5:46 ` bugzilla-daemon
  2022-09-02  8:36 ` bugzilla-daemon
                   ` (11 subsequent siblings)
  22 siblings, 0 replies; 27+ messages in thread
From: bugzilla-daemon @ 2022-09-02  5:46 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=216388

--- Comment #10 from Robert Dinse (nanook@eskimo.com) ---
This >MAY< be a compiler issue.  I was wondering why I seem to be the only one
having this problem.  Given how frequently it occurs to me, I would expect a
gazillion me too's, but so far none.

I know very few seem to be using gcc 12.1.0, which I was using, because I seem
to be the only person who had problems compiling 5.18 with it.

Since gcc 12.2.0 was out, I built it today and rebuilt a kernel using it, so
far that kernel has not produced any cpu-stall reports.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug 216388] On Host, kernel errors in KVM, on guests, it shows CPU stalls
  2022-08-21  7:37 [Bug 216388] New: On Host, kernel errors in KVM, on guests, it shows CPU stalls bugzilla-daemon
                   ` (10 preceding siblings ...)
  2022-09-02  5:46 ` bugzilla-daemon
@ 2022-09-02  8:36 ` bugzilla-daemon
  2022-09-03  1:37 ` bugzilla-daemon
                   ` (10 subsequent siblings)
  22 siblings, 0 replies; 27+ messages in thread
From: bugzilla-daemon @ 2022-09-02  8:36 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=216388

--- Comment #11 from Robert Dinse (nanook@eskimo.com) ---
Well still happening with gcc-12.2.0 but seems to be somewhat less frequent.

[    7.092312] rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: {
4-... } 3 jiffies s: 389 root: 0x10/.
[    7.092329] rcu: blocking rcu_node structures (internal RCU debug):
[    7.092332] Task dump for CPU 4:
[    7.092334] task:modprobe        state:R  running task     stack:    0 pid:
1502 ppid:     8 flags:0x0000400a
[    7.092338] Call Trace:
[    7.092344]  <TASK>
[    7.092347]  ? __wake_up_common_lock+0x87/0xc0
[    7.092355]  ? sysvec_apic_timer_interrupt+0x90/0xa0
[    7.092361]  ? insn_get_prefixes+0x1f1/0x440
[    7.092365]  ? load_new_mm_cr3+0x7f/0xe0
[    7.092368]  ? cpumask_any_but+0x35/0x50
[    7.092372]  ? x2apic_send_IPI_allbutself+0x2f/0x40
[    7.092375]  ? do_sync_core+0x2a/0x30
[    7.092379]  ? cpumask_next+0x23/0x30
[    7.092381]  ? smp_call_function_many_cond+0xea/0x370
[    7.092386]  ? text_poke_memset+0x20/0x20
[    7.092389]  ? arch_unregister_cpu+0x50/0x50
[    7.092394]  ? __fscache_acquire_cookie+0x4f4/0x500 [fscache]
[    7.092407]  ? on_each_cpu_cond_mask+0x1d/0x30
[    7.092409]  ? text_poke_bp_batch+0xaf/0x210
[    7.092412]  ? __traceiter_fscache_volume+0x60/0x60 [fscache]
[    7.092421]  ? __fscache_acquire_cookie+0x4f4/0x500 [fscache]
[    7.092429]  ? __fscache_acquire_cookie+0x4f4/0x500 [fscache]
[    7.092438]  ? text_poke_bp+0x49/0x70
[    7.092440]  ? __static_call_transform+0x7f/0x120
[    7.092442]  ? arch_static_call_transform+0x87/0xa0
[    7.092446]  ? __static_call_init+0x167/0x210
[    7.092450]  ? static_call_module_notify+0x13e/0x1a0
[    7.092452]  ? blocking_notifier_call_chain_robust+0x72/0xd0
[    7.092456]  ? load_module+0x2068/0x25e0
[    7.092459]  ? ima_post_read_file+0xd5/0x100
[    7.092464]  ? __do_sys_finit_module+0xbd/0x130
[    7.092466]  ? __do_sys_finit_module+0xbd/0x130
[    7.092469]  ? __x64_sys_finit_module+0x18/0x20
[    7.092470]  ? do_syscall_64+0x5b/0x80
[    7.092474]  ? ksys_mmap_pgoff+0x108/0x250
[    7.092478]  ? do_syscall_64+0x67/0x80
[    7.092480]  ? exit_to_user_mode_prepare+0x41/0x1e0
[    7.092485]  ? syscall_exit_to_user_mode+0x1b/0x40
[    7.092487]  ? do_syscall_64+0x67/0x80
[    7.092489]  ? do_syscall_64+0x67/0x80
[    7.092492]  ? entry_SYSCALL_64_after_hwframe+0x63/0xcd
[    7.092496]  </TASK>

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug 216388] On Host, kernel errors in KVM, on guests, it shows CPU stalls
  2022-08-21  7:37 [Bug 216388] New: On Host, kernel errors in KVM, on guests, it shows CPU stalls bugzilla-daemon
                   ` (11 preceding siblings ...)
  2022-09-02  8:36 ` bugzilla-daemon
@ 2022-09-03  1:37 ` bugzilla-daemon
  2022-09-03  2:03 ` bugzilla-daemon
                   ` (9 subsequent siblings)
  22 siblings, 0 replies; 27+ messages in thread
From: bugzilla-daemon @ 2022-09-03  1:37 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=216388

--- Comment #12 from Robert Dinse (nanook@eskimo.com) ---
There are four machines where these seem to happen within moments of a boot, so
I am going to five the EOL 5.18.19 a try as they are all guest machines and
thus I can easily reboot remotely into a working kernel if 5.18 locks or
otherwise does not work.

Another thing I tried was raising the rcu expedited stalls timeout from 20ms to
40ms, but made no difference.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug 216388] On Host, kernel errors in KVM, on guests, it shows CPU stalls
  2022-08-21  7:37 [Bug 216388] New: On Host, kernel errors in KVM, on guests, it shows CPU stalls bugzilla-daemon
                   ` (12 preceding siblings ...)
  2022-09-03  1:37 ` bugzilla-daemon
@ 2022-09-03  2:03 ` bugzilla-daemon
  2022-09-03  5:31 ` bugzilla-daemon
                   ` (8 subsequent siblings)
  22 siblings, 0 replies; 27+ messages in thread
From: bugzilla-daemon @ 2022-09-03  2:03 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=216388

--- Comment #13 from Robert Dinse (nanook@eskimo.com) ---
I am going to 'five' meant to say 'give' but given there is no edit function
here...

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug 216388] On Host, kernel errors in KVM, on guests, it shows CPU stalls
  2022-08-21  7:37 [Bug 216388] New: On Host, kernel errors in KVM, on guests, it shows CPU stalls bugzilla-daemon
                   ` (13 preceding siblings ...)
  2022-09-03  2:03 ` bugzilla-daemon
@ 2022-09-03  5:31 ` bugzilla-daemon
  2022-09-03  5:37 ` bugzilla-daemon
                   ` (7 subsequent siblings)
  22 siblings, 0 replies; 27+ messages in thread
From: bugzilla-daemon @ 2022-09-03  5:31 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=216388

--- Comment #14 from Robert Dinse (nanook@eskimo.com) ---
Ok, with 5.18.19 no rcu sched detected expedited stalls so this is definitely
something that broke between 5.18.19 and 5.19.0.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug 216388] On Host, kernel errors in KVM, on guests, it shows CPU stalls
  2022-08-21  7:37 [Bug 216388] New: On Host, kernel errors in KVM, on guests, it shows CPU stalls bugzilla-daemon
                   ` (14 preceding siblings ...)
  2022-09-03  5:31 ` bugzilla-daemon
@ 2022-09-03  5:37 ` bugzilla-daemon
  2022-09-06 15:52   ` Sean Christopherson
  2022-09-04  4:17 ` bugzilla-daemon
                   ` (6 subsequent siblings)
  22 siblings, 1 reply; 27+ messages in thread
From: bugzilla-daemon @ 2022-09-03  5:37 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=216388

--- Comment #15 from Robert Dinse (nanook@eskimo.com) ---
Please forgive my lack of knowledge regarding git, but is there a way to get a
patch that took the kernel from 5.18.19 to 5.19.0 now that earlier releases of
5.19.x are not on the kernel.org site?  I know there is a patch that goes from
5.18.19 to 5.19.6 and one that goes 5.19.5 to 5.19.6 but I just want to look at
the changes between 5.18.19 and 5.19.0.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug 216388] On Host, kernel errors in KVM, on guests, it shows CPU stalls
  2022-08-21  7:37 [Bug 216388] New: On Host, kernel errors in KVM, on guests, it shows CPU stalls bugzilla-daemon
                   ` (15 preceding siblings ...)
  2022-09-03  5:37 ` bugzilla-daemon
@ 2022-09-04  4:17 ` bugzilla-daemon
  2022-09-04  5:41 ` bugzilla-daemon
                   ` (5 subsequent siblings)
  22 siblings, 0 replies; 27+ messages in thread
From: bugzilla-daemon @ 2022-09-04  4:17 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=216388

--- Comment #16 from Robert Dinse (nanook@eskimo.com) ---
*** Bug 216399 has been marked as a duplicate of this bug. ***

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug 216388] On Host, kernel errors in KVM, on guests, it shows CPU stalls
  2022-08-21  7:37 [Bug 216388] New: On Host, kernel errors in KVM, on guests, it shows CPU stalls bugzilla-daemon
                   ` (16 preceding siblings ...)
  2022-09-04  4:17 ` bugzilla-daemon
@ 2022-09-04  5:41 ` bugzilla-daemon
  2022-09-05  4:06 ` bugzilla-daemon
                   ` (4 subsequent siblings)
  22 siblings, 0 replies; 27+ messages in thread
From: bugzilla-daemon @ 2022-09-04  5:41 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=216388

--- Comment #17 from Robert Dinse (nanook@eskimo.com) ---
5.18.19 has run for a day on my four busiest servers and on my workstation
without errors, where as 5.19.0 would generally generate cpu expedited stall
warnings within minutes of boot.  So definitely broken from 5.18.19 -> 5.19.0.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug 216388] On Host, kernel errors in KVM, on guests, it shows CPU stalls
  2022-08-21  7:37 [Bug 216388] New: On Host, kernel errors in KVM, on guests, it shows CPU stalls bugzilla-daemon
                   ` (17 preceding siblings ...)
  2022-09-04  5:41 ` bugzilla-daemon
@ 2022-09-05  4:06 ` bugzilla-daemon
  2022-09-06 15:52 ` bugzilla-daemon
                   ` (3 subsequent siblings)
  22 siblings, 0 replies; 27+ messages in thread
From: bugzilla-daemon @ 2022-09-05  4:06 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=216388

--- Comment #18 from Robert Dinse (nanook@eskimo.com) ---
Since 6.0.0-rc4 came out today I decided to give it a try.  rc3 did not work
with my Intel graphics, booted and ran okay except no display, rc2 had issues
where on some websites such as odysee.com, video would not play at all even
though it was fine on youtube, on others like bitchute, video would play but no
audio, rc1 ran for three hours then hard-locked, not even magic-sys-req key
worked.

But 4th time's a charm it would seem.  rc4 worked the display properly again..
And video worked on all the websites.  And I didn't get any of the RCU
expedited CPU stalls AND it FLIES!  My PHP based wordpress website loads in an
awesome 38ms!  And at least half of that is network latency between where I am
and my servers are located.

So I'm not going to continue to pursue 5.19, I don't feel real comfortable
using release candidates for live workloads but this is working better than
anything ever has.'

If 5.19 isn't going to become a long term release candidate perhaps should just
close this ticket as will not fix.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Bug 216388] On Host, kernel errors in KVM, on guests, it shows CPU stalls
  2022-09-03  5:37 ` bugzilla-daemon
@ 2022-09-06 15:52   ` Sean Christopherson
  0 siblings, 0 replies; 27+ messages in thread
From: Sean Christopherson @ 2022-09-06 15:52 UTC (permalink / raw)
  To: bugzilla-daemon; +Cc: kvm

On Sat, Sep 03, 2022, bugzilla-daemon@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=216388
> 
> --- Comment #15 from Robert Dinse (nanook@eskimo.com) ---
> Please forgive my lack of knowledge regarding git, but is there a way to get a
> patch that took the kernel from 5.18.19 to 5.19.0 now that earlier releases of
> 5.19.x are not on the kernel.org site?

Strictly speaking, no.  Stable branches, i.e. v5.18.x in this case, are effectively
forks.  After v5.18.0, everything that goes into v5.18.y is a unique commit, even
if bug fixes are based on an upstream (master branch) commit.

Visually, it's something like this.

v5.18.0 --> v5.18.1 --> v5.18.2 --> v5.18.y
\
 -> ... -> v5.19.0 -> v5.19.1
           \
            -> ... -> v5.20


IIUC, in this situation v5.18.0 isn't stable enough to test on its own, but the
v5.18.19 candidate is fully healthy.  In that case, if you wanted to bisect between
v5.18.0 and v5.19.0 to figure out what broke in v5.19, the least awful approach
would be to first find what commit(s) between v5.18.0 and v5.18.19 fixed the unrelated
instability in v5.18.0, and then manually apply that commit(s) at every stage when
bisecting between v5.18.0 and v5.19.0 to identify the buggy commit that introduced
the CPU/RCU stalls.

> I know there is a patch that goes from 5.18.19 to 5.19.6

I assume you mean v5.18.19 => v5.18.20?

> and one that goes 5.19.5 to 5.19.6 but I just want to look at the changes
> between 5.18.19 and 5.19.0.

If you just want to look at the changes, you can always do

	git diff <commit A>..<commit B>

e.g.

	git diff v5.18.18..v5.19

but that's going to show _all_ changes in a single diff, i.e. pinpointing exactly
what change broke/fixed something is extremely difficult.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug 216388] On Host, kernel errors in KVM, on guests, it shows CPU stalls
  2022-08-21  7:37 [Bug 216388] New: On Host, kernel errors in KVM, on guests, it shows CPU stalls bugzilla-daemon
                   ` (18 preceding siblings ...)
  2022-09-05  4:06 ` bugzilla-daemon
@ 2022-09-06 15:52 ` bugzilla-daemon
  2022-09-06 21:44 ` bugzilla-daemon
                   ` (2 subsequent siblings)
  22 siblings, 0 replies; 27+ messages in thread
From: bugzilla-daemon @ 2022-09-06 15:52 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=216388

--- Comment #19 from Sean Christopherson (seanjc@google.com) ---
On Sat, Sep 03, 2022, bugzilla-daemon@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=216388
> 
> --- Comment #15 from Robert Dinse (nanook@eskimo.com) ---
> Please forgive my lack of knowledge regarding git, but is there a way to get
> a
> patch that took the kernel from 5.18.19 to 5.19.0 now that earlier releases
> of
> 5.19.x are not on the kernel.org site?

Strictly speaking, no.  Stable branches, i.e. v5.18.x in this case, are
effectively
forks.  After v5.18.0, everything that goes into v5.18.y is a unique commit,
even
if bug fixes are based on an upstream (master branch) commit.

Visually, it's something like this.

v5.18.0 --> v5.18.1 --> v5.18.2 --> v5.18.y
\
 -> ... -> v5.19.0 -> v5.19.1
           \
            -> ... -> v5.20


IIUC, in this situation v5.18.0 isn't stable enough to test on its own, but the
v5.18.19 candidate is fully healthy.  In that case, if you wanted to bisect
between
v5.18.0 and v5.19.0 to figure out what broke in v5.19, the least awful approach
would be to first find what commit(s) between v5.18.0 and v5.18.19 fixed the
unrelated
instability in v5.18.0, and then manually apply that commit(s) at every stage
when
bisecting between v5.18.0 and v5.19.0 to identify the buggy commit that
introduced
the CPU/RCU stalls.

> I know there is a patch that goes from 5.18.19 to 5.19.6

I assume you mean v5.18.19 => v5.18.20?

> and one that goes 5.19.5 to 5.19.6 but I just want to look at the changes
> between 5.18.19 and 5.19.0.

If you just want to look at the changes, you can always do

        git diff <commit A>..<commit B>

e.g.

        git diff v5.18.18..v5.19

but that's going to show _all_ changes in a single diff, i.e. pinpointing
exactly
what change broke/fixed something is extremely difficult.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug 216388] On Host, kernel errors in KVM, on guests, it shows CPU stalls
  2022-08-21  7:37 [Bug 216388] New: On Host, kernel errors in KVM, on guests, it shows CPU stalls bugzilla-daemon
                   ` (19 preceding siblings ...)
  2022-09-06 15:52 ` bugzilla-daemon
@ 2022-09-06 21:44 ` bugzilla-daemon
  2022-09-17 19:53 ` bugzilla-daemon
  2022-09-17 20:23 ` bugzilla-daemon
  22 siblings, 0 replies; 27+ messages in thread
From: bugzilla-daemon @ 2022-09-06 21:44 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=216388

--- Comment #20 from Robert Dinse (nanook@eskimo.com) ---
At this point 6.0rc4 is running flawlessly so whatever was broken in 5.19 is
fixed in 6.0.  If 5.19 is going to be a long term support release then it's
worth continuing to pursue but if not there is little point.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug 216388] On Host, kernel errors in KVM, on guests, it shows CPU stalls
  2022-08-21  7:37 [Bug 216388] New: On Host, kernel errors in KVM, on guests, it shows CPU stalls bugzilla-daemon
                   ` (20 preceding siblings ...)
  2022-09-06 21:44 ` bugzilla-daemon
@ 2022-09-17 19:53 ` bugzilla-daemon
  2022-09-17 20:23 ` bugzilla-daemon
  22 siblings, 0 replies; 27+ messages in thread
From: bugzilla-daemon @ 2022-09-17 19:53 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=216388

--- Comment #21 from Robert Dinse (nanook@eskimo.com) ---
Well shite!  6.0rc4 ran perfectly, but 6.0rc5 is back to massive CPU stalls
just like 5.19.  AAAARRRRGGGGHHHH!

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug 216388] On Host, kernel errors in KVM, on guests, it shows CPU stalls
  2022-08-21  7:37 [Bug 216388] New: On Host, kernel errors in KVM, on guests, it shows CPU stalls bugzilla-daemon
                   ` (21 preceding siblings ...)
  2022-09-17 19:53 ` bugzilla-daemon
@ 2022-09-17 20:23 ` bugzilla-daemon
  22 siblings, 0 replies; 27+ messages in thread
From: bugzilla-daemon @ 2022-09-17 20:23 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=216388

Robert Dinse (nanook@eskimo.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |MOVED

--- Comment #22 from Robert Dinse (nanook@eskimo.com) ---
Since this is happening all over not just KVM code AND since it's happening
even on machines with no KVM-QEMU guests, this ticket is targeting the wrong
code and so I'm closing it and opening up a new ticket with more current and
extensive details.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2022-09-17 20:23 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-21  7:37 [Bug 216388] New: On Host, kernel errors in KVM, on guests, it shows CPU stalls bugzilla-daemon
2022-08-22 17:50 ` Sean Christopherson
2022-08-22 23:21   ` Zhenyu Wang
2022-08-22 17:50 ` [Bug 216388] " bugzilla-daemon
2022-08-22 23:46 ` bugzilla-daemon
2022-08-23  0:57 ` bugzilla-daemon
2022-08-27 19:42 ` bugzilla-daemon
2022-08-28 21:08 ` bugzilla-daemon
2022-09-01  6:09 ` bugzilla-daemon
2022-09-01 16:44   ` Sean Christopherson
2022-09-01 16:44 ` bugzilla-daemon
2022-09-01 19:46 ` bugzilla-daemon
2022-09-01 21:37 ` bugzilla-daemon
2022-09-02  5:46 ` bugzilla-daemon
2022-09-02  8:36 ` bugzilla-daemon
2022-09-03  1:37 ` bugzilla-daemon
2022-09-03  2:03 ` bugzilla-daemon
2022-09-03  5:31 ` bugzilla-daemon
2022-09-03  5:37 ` bugzilla-daemon
2022-09-06 15:52   ` Sean Christopherson
2022-09-04  4:17 ` bugzilla-daemon
2022-09-04  5:41 ` bugzilla-daemon
2022-09-05  4:06 ` bugzilla-daemon
2022-09-06 15:52 ` bugzilla-daemon
2022-09-06 21:44 ` bugzilla-daemon
2022-09-17 19:53 ` bugzilla-daemon
2022-09-17 20:23 ` bugzilla-daemon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).