qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Denis V. Lunev" <den-lists@parallels.com>
To: Paolo Bonzini <pbonzini@redhat.com>,
	Jim Minter <jminter@redhat.com>,
	qemu-devel <qemu-devel@nongnu.org>,
	Hannes Reinecke <hare@suse.de>
Cc: Roman Kagan <rkagan@virtuozzo.com>
Subject: Re: [Qemu-devel] sda abort with virtio-scsi
Date: Thu, 4 Feb 2016 14:00:24 +0300	[thread overview]
Message-ID: <56B32F48.5070804@parallels.com> (raw)
In-Reply-To: <56B326B4.1020407@redhat.com>

On 02/04/2016 01:23 PM, Paolo Bonzini wrote:
>
> On 04/02/2016 00:34, Jim Minter wrote:
>> I was worried there was
>> some way in which the contention could cause an abort and perhaps thence
>> the lockup (which does not seem to recover when the host load goes down).
> I don't know... It's not the most tested code, but it is not very
> complicated either.
>
> The certain points that can be extracted from the kernel messages are:
> 1) there was a cancellation request that took a long time, >20 seconds;
> 2) despite taking a long time, it _did_ recover sooner or later because
> otherwise you'd not have the lockup splat either.
>
> Paolo
>
>>> Firing the NMI watchdog is fixed in more recent QEMU, which has
>>> asynchronous cancellation, assuming you're running RHEL's QEMU 1.5.3
>>> (try /usr/libexec/qemu-kvm --version, or rpm -qf /usr/libexec/qemu-kvm).
>> /usr/libexec/qemu-kvm --version reports QEMU emulator version 1.5.3
>> (qemu-kvm-1.5.3-105.el7_2.3)

my $0.02 to the account. This could be related or could be
not related.

speaking about NMI we do observe similar problems on older
AMD hosts with big enough number of VCPUs in guest.
On a simple boot we see something like this (the probability to
face the problem is around 1/10). RHEV 2.3 QEMU is used, the same
kernel is running on host node:

[   72.189005] Kernel panic - not syncing: softlockup: hung tasks
[   72.189005] CPU: 5 PID: 593 Comm: systemd-udevd Tainted: G             L ------------   3.10.0-327.4.4.el7.x86_64 #1
[   72.189005] Hardware name: Red Hat KVM, BIOS seabios-1.7.5-11.vz7.2 04/01/2014
[   72.189005]  ffffffff81871a03 0000000050291887 ffff88007fd43e18 ffffffff8163515c
[   72.189005]  ffff88007fd43e98 ffffffff8162e9d7 0000000000000008 ffff88007fd43ea8
[   72.189005]  ffff88007fd43e48 0000000050291887 0000000000002710 0000000000000000
[   72.189005] Call Trace:
[   72.189005]  <IRQ>  [<ffffffff8163515c>] dump_stack+0x19/0x1b
[   72.189005]  [<ffffffff8162e9d7>] panic+0xd8/0x1e7
[   72.189005]  [<ffffffff8111b376>] watchdog_timer_fn+0x1b6/0x1c0
[   72.189005]  [<ffffffff8111b1c0>] ? watchdog_enable+0xc0/0xc0
[   72.189005]  [<ffffffff810a9d42>] __hrtimer_run_queues+0xd2/0x260
[   72.189005]  [<ffffffff810aa2e0>] hrtimer_interrupt+0xb0/0x1e0
[   72.189005]  [<ffffffff816471dc>] ? call_softirq+0x1c/0x30
[   72.189005]  [<ffffffff81065d40>] ? flush_tlb_func+0xb0/0xb0
[   72.189005]  [<ffffffff81049537>] local_apic_timer_interrupt+0x37/0x60
[   72.189005]  [<ffffffff81647e4f>] smp_apic_timer_interrupt+0x3f/0x60
[   72.189005]  [<ffffffff8164651d>] apic_timer_interrupt+0x6d/0x80
[   72.189005]  <EOI>  [<ffffffff812f27e9>] ? free_cpumask_var+0x9/0x10
[   72.189005]  [<ffffffff810e6de2>] ? smp_call_function_many+0x202/0x260
[   72.189005]  [<ffffffff81065d40>] ? flush_tlb_func+0xb0/0xb0
[   72.189005]  [<ffffffff810e6e9d>] on_each_cpu+0x2d/0x60
[   72.189005]  [<ffffffff81066119>] flush_tlb_kernel_range+0x59/0xa0
[   72.189005]  [<ffffffff811a4730>] __purge_vmap_area_lazy+0x1a0/0x210
[   72.189005]  [<ffffffff811a4927>] vm_unmap_aliases+0x187/0x1b0
[   72.189005]  [<ffffffff810620e8>] change_page_attr_set_clr+0xe8/0x4d0
[   72.189005]  [<ffffffff81131127>] ? ring_buffer_time_stamp+0x7/0x10
[   72.189005]  [<ffffffff8106299f>] set_memory_ro+0x2f/0x40
[   72.189005]  [<ffffffff8162f7c4>] set_section_ro_nx+0x3a/0x71
[   72.189005]  [<ffffffff810ed19c>] load_module+0x103c/0x1b50
[   72.189005]  [<ffffffff810e9743>] ? copy_module_from_fd.isra.42+0x53/0x150
[   72.189005]  [<ffffffff810ede66>] SyS_finit_module+0xa6/0xd0
[   72.189005]  [<ffffffff816458c9>] system_call_fastpath+0x16/0x1b
[   72.189005] Shutting down cpus with NMI


Timesources changes does not help. Also there is a strange stuff like this:

Having configured the serial port to point at a unix socket, and timestamping
the messages on the host side, we observe

guest timestamp | host timestamp
        0.000000 | 20:09:26.805461
        2.587056 | 20:09:30.000993
        7.607329 | 20:09:35.062367
       12.645539 | 20:09:40.057634
       22.608054 | 20:09:50.028727
       32.395499 | 20:10:00.041215
       42.571265 | 20:10:10.041960
       47.606661 | 20:10:15.028973
       48.627059 | 20:10:20.022359
       49.029059 | 20:10:25.047857
       49.399065 | 20:10:30.066884
       49.809077 | 20:10:35.036467
       58.159132 | 20:10:40.013387
       68.043371 | 20:10:40.266714

Note the anomaly around 47 seconds from boot.

anyway, this story is far from the end and we are unable to provide more 
details. Roma is digging this story. Den

  reply	other threads:[~2016-02-04 11:00 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-03 21:46 [Qemu-devel] sda abort with virtio-scsi Jim Minter
2016-02-03 23:19 ` Paolo Bonzini
2016-02-03 23:34   ` Jim Minter
2016-02-04 10:23     ` Paolo Bonzini
2016-02-04 11:00       ` Denis V. Lunev [this message]
2016-02-04 13:41       ` Jim Minter
2016-02-04 13:54         ` Hannes Reinecke
2016-02-04 15:03           ` Paolo Bonzini
2016-02-04 15:11             ` Hannes Reinecke
2016-02-08 20:02         ` Jim Minter
2016-02-04  6:59   ` Hannes Reinecke
2016-02-04 11:27     ` Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56B32F48.5070804@parallels.com \
    --to=den-lists@parallels.com \
    --cc=hare@suse.de \
    --cc=jminter@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=rkagan@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).