All of lore.kernel.org
 help / color / mirror / Atom feed
* qemu takes 100% of a core, freezes the VM
@ 2018-01-31 15:56 JimR
  2018-02-01 16:52 ` Stefan Hajnoczi
  0 siblings, 1 reply; 4+ messages in thread
From: JimR @ 2018-01-31 15:56 UTC (permalink / raw)
  To: kvm

Host:  Fedora 26 with all patches on HP Pavilion 4-core 3.2 GHz

VMM 1.4.3

Guest: RHEL 7.4, server with GUI. (also CentOS 7 server with GUI, but 
never running at the same time as rhel)

Guest invariably freezes, sometimes after 5 minutes, sometimes after 45 
minutes.  It will not accept any keyboard nor mouse input.  This happens 
when the only application running in guest is the terminal, but it is 
not running anything, just waiting for my next command.

VMM shows CPU usage spikes and stays there.  Host htop shows qemu is 
taking 100% of one core.

 From VMM, I can Force Reset or Force Off.  Sometimes the restart works 
successfully, sometimes I have to quit VMM and kill all qemu PIDs.

Is there any know fix or workaround for this.  I am trying to take RHCSA 
course, and it is very disruptive to have it freeze in the middle of a 
session.

Thanks,

-- 
73 de Jim, KD1YV

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: qemu takes 100% of a core, freezes the VM
  2018-01-31 15:56 qemu takes 100% of a core, freezes the VM JimR
@ 2018-02-01 16:52 ` Stefan Hajnoczi
  2018-02-02 15:52   ` JimR
  0 siblings, 1 reply; 4+ messages in thread
From: Stefan Hajnoczi @ 2018-02-01 16:52 UTC (permalink / raw)
  To: JimR; +Cc: kvm

[-- Attachment #1: Type: text/plain, Size: 1000 bytes --]

On Wed, Jan 31, 2018 at 10:56:47AM -0500, JimR wrote:
> Host:  Fedora 26 with all patches on HP Pavilion 4-core 3.2 GHz
> 
> VMM 1.4.3
> 
> Guest: RHEL 7.4, server with GUI. (also CentOS 7 server with GUI, but never
> running at the same time as rhel)
> 
> Guest invariably freezes, sometimes after 5 minutes, sometimes after 45
> minutes.  It will not accept any keyboard nor mouse input.  This happens
> when the only application running in guest is the terminal, but it is not
> running anything, just waiting for my next command.
> 
> VMM shows CPU usage spikes and stays there.  Host htop shows qemu is taking
> 100% of one core.

Please post the output of "mpstat -P ALL 1".  mpstat is from the sysstat
package.

If you see 100% %usr then QEMU is spinning.

If you see 100% %guest then the guest is spinning.

The next step would be to drill down on what activity is taking 100%
CPU.

Have you installed the latest updates on the host and inside the guest?

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: qemu takes 100% of a core, freezes the VM
  2018-02-01 16:52 ` Stefan Hajnoczi
@ 2018-02-02 15:52   ` JimR
  2018-02-08  9:24     ` Stefan Hajnoczi
  0 siblings, 1 reply; 4+ messages in thread
From: JimR @ 2018-02-02 15:52 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: kvm

On 02/01/2018 11:52 AM, Stefan Hajnoczi wrote:
> On Wed, Jan 31, 2018 at 10:56:47AM -0500, JimR wrote:
>> Host:  Fedora 26 with all patches on HP Pavilion 4-core 3.2 GHz
>>
>> VMM 1.4.3
>>
>> Guest: RHEL 7.4, server with GUI. (also CentOS 7 server with GUI, but never
>> running at the same time as rhel)
>>
>> Guest invariably freezes, sometimes after 5 minutes, sometimes after 45
>> minutes.  It will not accept any keyboard nor mouse input.  This happens
>> when the only application running in guest is the terminal, but it is not
>> running anything, just waiting for my next command.
>>
>> VMM shows CPU usage spikes and stays there.  Host htop shows qemu is taking
>> 100% of one core.
> Please post the output of "mpstat -P ALL 1".  mpstat is from the sysstat
> package.
>
> If you see 100% %usr then QEMU is spinning.
>
> If you see 100% %guest then the guest is spinning.
>
> The next step would be to drill down on what activity is taking 100%
> CPU.
>
> Have you installed the latest updates on the host and inside the guest?
>
> Stefan

I'm not sure if you received my reply yesterday.  It had a screen shot 
of htop embedded.  That seemed to bounce from majordomo.

Here is some additional info from a freeze this morning.  This is from 
the guest's /var/log/messages.  Note that the Call Trace repeated 7 
times just before the freeze.

Feb  2 09:39:48 rhcsa kernel: INFO: task systemd:1 blocked for more than 
120 seconds.
Feb  2 09:39:48 rhcsa kernel: "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb  2 09:39:48 rhcsa kernel: systemd         D ffff88003dba0000 0     
1      0 0x00000000
Feb  2 09:39:48 rhcsa kernel: Call Trace:
Feb  2 09:39:48 rhcsa kernel: [<ffffffff816ab8a9>] schedule+0x29/0x70
Feb  2 09:39:48 rhcsa kernel: [<ffffffff810625bf>] 
kvm_async_pf_task_wait+0x1df/0x230
Feb  2 09:39:48 rhcsa kernel: [<ffffffff810b3690>] ? 
wake_up_atomic_t+0x30/0x30
Feb  2 09:39:48 rhcsa kernel: [<ffffffff816afc00>] ? error_swapgs+0x61/0x18d
Feb  2 09:39:48 rhcsa kernel: [<ffffffff816afcef>] ? 
error_swapgs+0x150/0x18d
Feb  2 09:39:48 rhcsa kernel: [<ffffffff816b32d6>] 
do_async_page_fault+0x96/0xd0
Feb  2 09:39:48 rhcsa kernel: [<ffffffff816af928>] 
async_page_fault+0x28/0x30
Feb  2 09:39:48 rhcsa kernel: INFO: task crond:999 blocked for more than 
120 seconds.
Feb  2 09:39:48 rhcsa kernel: "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb  2 09:39:48 rhcsa kernel: crond           D ffff880036421fa0 0   
999      1 0x00000080

Thanks,
JimR

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: qemu takes 100% of a core, freezes the VM
  2018-02-02 15:52   ` JimR
@ 2018-02-08  9:24     ` Stefan Hajnoczi
  0 siblings, 0 replies; 4+ messages in thread
From: Stefan Hajnoczi @ 2018-02-08  9:24 UTC (permalink / raw)
  To: JimR; +Cc: kvm

[-- Attachment #1: Type: text/plain, Size: 3614 bytes --]

On Fri, Feb 02, 2018 at 10:52:34AM -0500, JimR wrote:
> On 02/01/2018 11:52 AM, Stefan Hajnoczi wrote:
> > On Wed, Jan 31, 2018 at 10:56:47AM -0500, JimR wrote:
> > > Host:  Fedora 26 with all patches on HP Pavilion 4-core 3.2 GHz
> > > 
> > > VMM 1.4.3
> > > 
> > > Guest: RHEL 7.4, server with GUI. (also CentOS 7 server with GUI, but never
> > > running at the same time as rhel)
> > > 
> > > Guest invariably freezes, sometimes after 5 minutes, sometimes after 45
> > > minutes.  It will not accept any keyboard nor mouse input.  This happens
> > > when the only application running in guest is the terminal, but it is not
> > > running anything, just waiting for my next command.
> > > 
> > > VMM shows CPU usage spikes and stays there.  Host htop shows qemu is taking
> > > 100% of one core.
> > Please post the output of "mpstat -P ALL 1".  mpstat is from the sysstat
> > package.
> > 
> > If you see 100% %usr then QEMU is spinning.
> > 
> > If you see 100% %guest then the guest is spinning.
> > 
> > The next step would be to drill down on what activity is taking 100%
> > CPU.
> > 
> > Have you installed the latest updates on the host and inside the guest?
> > 
> > Stefan
> 
> I'm not sure if you received my reply yesterday.  It had a screen shot of
> htop embedded.  That seemed to bounce from majordomo.

The mpstat output you posted had 100% %guest and low %user utilization.
This suggests the hang is not within the QEMU process on the host.  It's
the guest that is consuming a lot of CPU.

> Here is some additional info from a freeze this morning.  This is from the
> guest's /var/log/messages.  Note that the Call Trace repeated 7 times just
> before the freeze.
> 
> Feb  2 09:39:48 rhcsa kernel: INFO: task systemd:1 blocked for more than 120
> seconds.
> Feb  2 09:39:48 rhcsa kernel: "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Feb  2 09:39:48 rhcsa kernel: systemd         D ffff88003dba0000 0    
> 1      0 0x00000000
> Feb  2 09:39:48 rhcsa kernel: Call Trace:
> Feb  2 09:39:48 rhcsa kernel: [<ffffffff816ab8a9>] schedule+0x29/0x70
> Feb  2 09:39:48 rhcsa kernel: [<ffffffff810625bf>]
> kvm_async_pf_task_wait+0x1df/0x230
> Feb  2 09:39:48 rhcsa kernel: [<ffffffff810b3690>] ?
> wake_up_atomic_t+0x30/0x30
> Feb  2 09:39:48 rhcsa kernel: [<ffffffff816afc00>] ? error_swapgs+0x61/0x18d
> Feb  2 09:39:48 rhcsa kernel: [<ffffffff816afcef>] ?
> error_swapgs+0x150/0x18d
> Feb  2 09:39:48 rhcsa kernel: [<ffffffff816b32d6>]
> do_async_page_fault+0x96/0xd0
> Feb  2 09:39:48 rhcsa kernel: [<ffffffff816af928>]
> async_page_fault+0x28/0x30
> Feb  2 09:39:48 rhcsa kernel: INFO: task crond:999 blocked for more than 120
> seconds.
> Feb  2 09:39:48 rhcsa kernel: "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Feb  2 09:39:48 rhcsa kernel: crond           D ffff880036421fa0 0  
> 999      1 0x00000080

Weird, looks like the guest took a page fault and hung when trying to
schedule another task while the hypervisor resolves the page fault.

I hope someone else has ideas on what to check next.

My next idea is low-level debugging and might be too time-consuming for
you:

I would use "perf record -a kvm:\*" on the host while the guest is hung
and then "perf script" to view the trace log.  It contains all
vmenter/vmexit activity and might contain a clue about what the guest is
trying to do.

The "perf kvm" command might be useful in showing what's going on inside
the guest.  It profiles CPU activity inside the guest kernel.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2018-02-08  9:25 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-31 15:56 qemu takes 100% of a core, freezes the VM JimR
2018-02-01 16:52 ` Stefan Hajnoczi
2018-02-02 15:52   ` JimR
2018-02-08  9:24     ` Stefan Hajnoczi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.