From: "Denis V. Lunev" <den-lists@parallels.com>
To: Paolo Bonzini <pbonzini@redhat.com>,
Jim Minter <jminter@redhat.com>,
qemu-devel <qemu-devel@nongnu.org>,
Hannes Reinecke <hare@suse.de>
Cc: Roman Kagan <rkagan@virtuozzo.com>
Subject: Re: [Qemu-devel] sda abort with virtio-scsi
Date: Thu, 4 Feb 2016 14:00:24 +0300 [thread overview]
Message-ID: <56B32F48.5070804@parallels.com> (raw)
In-Reply-To: <56B326B4.1020407@redhat.com>
On 02/04/2016 01:23 PM, Paolo Bonzini wrote:
>
> On 04/02/2016 00:34, Jim Minter wrote:
>> I was worried there was
>> some way in which the contention could cause an abort and perhaps thence
>> the lockup (which does not seem to recover when the host load goes down).
> I don't know... It's not the most tested code, but it is not very
> complicated either.
>
> The certain points that can be extracted from the kernel messages are:
> 1) there was a cancellation request that took a long time, >20 seconds;
> 2) despite taking a long time, it _did_ recover sooner or later because
> otherwise you'd not have the lockup splat either.
>
> Paolo
>
>>> Firing the NMI watchdog is fixed in more recent QEMU, which has
>>> asynchronous cancellation, assuming you're running RHEL's QEMU 1.5.3
>>> (try /usr/libexec/qemu-kvm --version, or rpm -qf /usr/libexec/qemu-kvm).
>> /usr/libexec/qemu-kvm --version reports QEMU emulator version 1.5.3
>> (qemu-kvm-1.5.3-105.el7_2.3)
my $0.02 to the account. This could be related or could be
not related.
speaking about NMI we do observe similar problems on older
AMD hosts with big enough number of VCPUs in guest.
On a simple boot we see something like this (the probability to
face the problem is around 1/10). RHEV 2.3 QEMU is used, the same
kernel is running on host node:
[ 72.189005] Kernel panic - not syncing: softlockup: hung tasks
[ 72.189005] CPU: 5 PID: 593 Comm: systemd-udevd Tainted: G L ------------ 3.10.0-327.4.4.el7.x86_64 #1
[ 72.189005] Hardware name: Red Hat KVM, BIOS seabios-1.7.5-11.vz7.2 04/01/2014
[ 72.189005] ffffffff81871a03 0000000050291887 ffff88007fd43e18 ffffffff8163515c
[ 72.189005] ffff88007fd43e98 ffffffff8162e9d7 0000000000000008 ffff88007fd43ea8
[ 72.189005] ffff88007fd43e48 0000000050291887 0000000000002710 0000000000000000
[ 72.189005] Call Trace:
[ 72.189005] <IRQ> [<ffffffff8163515c>] dump_stack+0x19/0x1b
[ 72.189005] [<ffffffff8162e9d7>] panic+0xd8/0x1e7
[ 72.189005] [<ffffffff8111b376>] watchdog_timer_fn+0x1b6/0x1c0
[ 72.189005] [<ffffffff8111b1c0>] ? watchdog_enable+0xc0/0xc0
[ 72.189005] [<ffffffff810a9d42>] __hrtimer_run_queues+0xd2/0x260
[ 72.189005] [<ffffffff810aa2e0>] hrtimer_interrupt+0xb0/0x1e0
[ 72.189005] [<ffffffff816471dc>] ? call_softirq+0x1c/0x30
[ 72.189005] [<ffffffff81065d40>] ? flush_tlb_func+0xb0/0xb0
[ 72.189005] [<ffffffff81049537>] local_apic_timer_interrupt+0x37/0x60
[ 72.189005] [<ffffffff81647e4f>] smp_apic_timer_interrupt+0x3f/0x60
[ 72.189005] [<ffffffff8164651d>] apic_timer_interrupt+0x6d/0x80
[ 72.189005] <EOI> [<ffffffff812f27e9>] ? free_cpumask_var+0x9/0x10
[ 72.189005] [<ffffffff810e6de2>] ? smp_call_function_many+0x202/0x260
[ 72.189005] [<ffffffff81065d40>] ? flush_tlb_func+0xb0/0xb0
[ 72.189005] [<ffffffff810e6e9d>] on_each_cpu+0x2d/0x60
[ 72.189005] [<ffffffff81066119>] flush_tlb_kernel_range+0x59/0xa0
[ 72.189005] [<ffffffff811a4730>] __purge_vmap_area_lazy+0x1a0/0x210
[ 72.189005] [<ffffffff811a4927>] vm_unmap_aliases+0x187/0x1b0
[ 72.189005] [<ffffffff810620e8>] change_page_attr_set_clr+0xe8/0x4d0
[ 72.189005] [<ffffffff81131127>] ? ring_buffer_time_stamp+0x7/0x10
[ 72.189005] [<ffffffff8106299f>] set_memory_ro+0x2f/0x40
[ 72.189005] [<ffffffff8162f7c4>] set_section_ro_nx+0x3a/0x71
[ 72.189005] [<ffffffff810ed19c>] load_module+0x103c/0x1b50
[ 72.189005] [<ffffffff810e9743>] ? copy_module_from_fd.isra.42+0x53/0x150
[ 72.189005] [<ffffffff810ede66>] SyS_finit_module+0xa6/0xd0
[ 72.189005] [<ffffffff816458c9>] system_call_fastpath+0x16/0x1b
[ 72.189005] Shutting down cpus with NMI
Timesources changes does not help. Also there is a strange stuff like this:
Having configured the serial port to point at a unix socket, and timestamping
the messages on the host side, we observe
guest timestamp | host timestamp
0.000000 | 20:09:26.805461
2.587056 | 20:09:30.000993
7.607329 | 20:09:35.062367
12.645539 | 20:09:40.057634
22.608054 | 20:09:50.028727
32.395499 | 20:10:00.041215
42.571265 | 20:10:10.041960
47.606661 | 20:10:15.028973
48.627059 | 20:10:20.022359
49.029059 | 20:10:25.047857
49.399065 | 20:10:30.066884
49.809077 | 20:10:35.036467
58.159132 | 20:10:40.013387
68.043371 | 20:10:40.266714
Note the anomaly around 47 seconds from boot.
anyway, this story is far from the end and we are unable to provide more
details. Roma is digging this story. Den
next prev parent reply other threads:[~2016-02-04 11:00 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-02-03 21:46 [Qemu-devel] sda abort with virtio-scsi Jim Minter
2016-02-03 23:19 ` Paolo Bonzini
2016-02-03 23:34 ` Jim Minter
2016-02-04 10:23 ` Paolo Bonzini
2016-02-04 11:00 ` Denis V. Lunev [this message]
2016-02-04 13:41 ` Jim Minter
2016-02-04 13:54 ` Hannes Reinecke
2016-02-04 15:03 ` Paolo Bonzini
2016-02-04 15:11 ` Hannes Reinecke
2016-02-08 20:02 ` Jim Minter
2016-02-04 6:59 ` Hannes Reinecke
2016-02-04 11:27 ` Paolo Bonzini
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=56B32F48.5070804@parallels.com \
--to=den-lists@parallels.com \
--cc=hare@suse.de \
--cc=jminter@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=rkagan@virtuozzo.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).