kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: bugzilla-daemon@kernel.org
To: kvm@vger.kernel.org
Subject: [Bug 216388] On Host, kernel errors in KVM, on guests, it shows CPU stalls
Date: Mon, 22 Aug 2022 17:50:41 +0000	[thread overview]
Message-ID: <bug-216388-28872-roPQeJtu1V@https.bugzilla.kernel.org/> (raw)
In-Reply-To: <bug-216388-28872@https.bugzilla.kernel.org/>

https://bugzilla.kernel.org/show_bug.cgi?id=216388

--- Comment #1 from Sean Christopherson (seanjc@google.com) ---
+GVT folks

On Sun, Aug 21, 2022, bugzilla-daemon@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=216388
> 
>             Bug ID: 216388
>            Summary: On Host, kernel errors in KVM, on guests, it shows CPU
>                     stalls
>            Product: Virtualization
>            Version: unspecified
>     Kernel Version: 5.19.0 / 5.19.1 / 5.19.2
>           Hardware: All
>                 OS: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: high
>           Priority: P1
>          Component: kvm
>           Assignee: virtualization_kvm@kernel-bugs.osdl.org
>           Reporter: nanook@eskimo.com
>         Regression: No
> 
> Created attachment 301614
>   --> https://bugzilla.kernel.org/attachment.cgi?id=301614&action=edit
> The configuration file used to Comile this kernel.
> 
> This behavior has persisted across 5.19.0, 5.19.1, and 5.19.2.  While the
> kernel I am taking this example from is tainted (owing to using Intel
> development drivers for GPU virtualization), it is also occurring on
> non-tainted kernels on servers with no development or third party modules
> installed.
> 
> INFO: task CPU 2/KVM:2343 blocked for more than 1228 seconds.
> [207177.050049]       Tainted: G     U    I       5.19.2 #1
> [207177.050050] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [207177.050051] task:CPU 2/KVM       state:D stack:    0 pid: 2343 ppid:    
> 1
> flags:0x00000002
> [207177.050054] Call Trace:
> [207177.050055]  <TASK>
> [207177.050056]  __schedule+0x359/0x1400
> [207177.050060]  ? kvm_mmu_page_fault+0x1ee/0x980
> [207177.050062]  ? kvm_set_msr_common+0x31f/0x1060
> [207177.050065]  schedule+0x5f/0x100
> [207177.050066]  schedule_preempt_disabled+0x15/0x30
> [207177.050068]  __mutex_lock.constprop.0+0x4e2/0x750
> [207177.050070]  ? aa_file_perm+0x124/0x4f0
> [207177.050071]  __mutex_lock_slowpath+0x13/0x20
> [207177.050072]  mutex_lock+0x25/0x30
> [207177.050075]  intel_vgpu_emulate_mmio_read+0x5d/0x3b0 [kvmgt]

This isn't a KVM problem, it's a KVMGT problem (despite the name, KVMGT is very
much not KVM).

> [207177.050084]  intel_vgpu_rw+0xb8/0x1c0 [kvmgt]
> [207177.050091]  intel_vgpu_read+0x20d/0x250 [kvmgt]
> [207177.050097]  vfio_device_fops_read+0x1f/0x40
> [207177.050100]  vfs_read+0x9b/0x160
> [207177.050102]  __x64_sys_pread64+0x93/0xd0
> [207177.050104]  do_syscall_64+0x58/0x80
> [207177.050106]  ? kvm_on_user_return+0x84/0xe0
> [207177.050107]  ? fire_user_return_notifiers+0x37/0x70
> [207177.050109]  ? exit_to_user_mode_prepare+0x41/0x200
> [207177.050111]  ? syscall_exit_to_user_mode+0x1b/0x40
> [207177.050112]  ? do_syscall_64+0x67/0x80
> [207177.050114]  ? irqentry_exit+0x54/0x70
> [207177.050115]  ? sysvec_call_function_single+0x4b/0xa0
> [207177.050116]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
> [207177.050118] RIP: 0033:0x7ff51131293f
> [207177.050119] RSP: 002b:00007ff4ddffa260 EFLAGS: 00000293 ORIG_RAX:
> 0000000000000011
> [207177.050121] RAX: ffffffffffffffda RBX: 00005599a6835420 RCX:
> 00007ff51131293f
> [207177.050122] RDX: 0000000000000004 RSI: 00007ff4ddffa2a8 RDI:
> 0000000000000027
> [207177.050123] RBP: 0000000000000004 R08: 0000000000000000 R09:
> 00000000ffffffff
> [207177.050124] R10: 0000000000065f10 R11: 0000000000000293 R12:
> 0000000000065f10
> [207177.050124] R13: 00005599a6835330 R14: 0000000000000004 R15:
> 0000000000065f10
> [207177.050126]  </TASK>
> 
>      I am seeing this on Intel i7-6700k, i7-6850k, and i7-9700k platforms.
> 
>      This did not happen on 5.17 kernels, and 5.18 kernels never ran stable
> enough on my platforms to actually run them for more than a few minutes.
> 
>      Likewise 6.0-rc1 has not been stable enough to run in production.  After
> less than three hours running on my workstation it locked hard with even the
> magic sys-request key being unresponsive and only power cycling the machine
> got
> it back.
> 
>      The operating system in use for the host on all machines is Ubuntu
>      22.04.
> 
>      Guests vary with Ubuntu 22.04 being the most common but also Mint,
>      Debian,
> Manjaro, Centos, Fedora, ScientificLinux, Zorin, and Windows being in use.
> 
>      I see the same issue manifest on platforms running only Ubuntu guests as
> with guests of varying operating systems.  
> 
>      The configuration file I used to compile this kernel is attached.  I
> compiled it with gcc 12.1.0.
> 
>      This behavior does not manifest itself instantly, typically the machine
> needs to be running 3-7 days before it does.  Once it does guests keep
> stalling
> and restarting libvirtd does not help.  Only thing that seems to is a hard
> reboot of the physical host.  For this reason I believe the issue lies
> strictly
> with the host and not the guests.
> 
>      I have listed it as a severity of high since it is completely service
> interrupting.
> 
> -- 
> You may reply to this email to add a comment.
> 
> You are receiving this mail because:
> You are watching the assignee of the bug.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

  parent reply	other threads:[~2022-08-22 17:51 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-21  7:37 [Bug 216388] New: On Host, kernel errors in KVM, on guests, it shows CPU stalls bugzilla-daemon
2022-08-22 17:50 ` Sean Christopherson
2022-08-22 23:21   ` Zhenyu Wang
2022-08-22 17:50 ` bugzilla-daemon [this message]
2022-08-22 23:46 ` [Bug 216388] " bugzilla-daemon
2022-08-23  0:57 ` bugzilla-daemon
2022-08-27 19:42 ` bugzilla-daemon
2022-08-28 21:08 ` bugzilla-daemon
2022-09-01  6:09 ` bugzilla-daemon
2022-09-01 16:44   ` Sean Christopherson
2022-09-01 16:44 ` bugzilla-daemon
2022-09-01 19:46 ` bugzilla-daemon
2022-09-01 21:37 ` bugzilla-daemon
2022-09-02  5:46 ` bugzilla-daemon
2022-09-02  8:36 ` bugzilla-daemon
2022-09-03  1:37 ` bugzilla-daemon
2022-09-03  2:03 ` bugzilla-daemon
2022-09-03  5:31 ` bugzilla-daemon
2022-09-03  5:37 ` bugzilla-daemon
2022-09-06 15:52   ` Sean Christopherson
2022-09-04  4:17 ` bugzilla-daemon
2022-09-04  5:41 ` bugzilla-daemon
2022-09-05  4:06 ` bugzilla-daemon
2022-09-06 15:52 ` bugzilla-daemon
2022-09-06 21:44 ` bugzilla-daemon
2022-09-17 19:53 ` bugzilla-daemon
2022-09-17 20:23 ` bugzilla-daemon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-216388-28872-roPQeJtu1V@https.bugzilla.kernel.org/ \
    --to=bugzilla-daemon@kernel.org \
    --cc=kvm@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).